Elasticsearch match complete array of terms - java

I need to match a complete array of terms with elasticsearch.
Only documents that have a array with the same elements should be returned.
There should be neither more elements nor a subset of elements in the document's array.
The order of elements does not matter.
Example:
filter:
id: ["a", "b"]
documents:
id: ["a", "b"] -> match
id: ["b", "a"] -> match
id: ["a"] -> no match
id: ["a", "b", "c"] -> no match
Eventually I want to use Java High Level REST Client to implement the query, though a example for elasticsearch dsl will do as well.

I'd like to propose something that will prevent you from maintaining a long chain of "must" conditions as soon as your requirements will change (e.g., imagine you have an array of six items to match). I'm going to rely on a script query, which might look like over-engineered but it will be easy to create a search template out of it (https://www.elastic.co/guide/en/elasticsearch/reference/7.5/search-template.html).
{
"query": {
"bool": {
"filter": {
"script": {
"script": {
"source": """
def ids = new ArrayList(doc['id.keyword']);
def param = new ArrayList(params.terms);
def isSameSize = ids.size() == param.size();
def isSameContent = ids.containsAll(param);
return isSameSize && isSameContent
""",
"lang": "painless",
"params": {
"terms": [ "a", "b" ]
}
}
}
}
}
}
}
This way, the only thing that you will need to change is the value of the terms parameter.

While this does not seem to be supported natively you could go ahead and use a script filter to achieve this behavior like so:
GET your_index/_search
{
"query": {
"bool": {
"must": [
{
"script": {
"script": "doc['tags'].values.length == 2"
}
},
{
"term": {
"tags": {
"value": "a"
}
}
},
{
"term": {
"tags": {
"value": "b"
}
}
}
]
}
}
}
The script filter limits the search result by the array size while the term filters specify the values of that array. Make sure to enable fielddata on the tags field in order to execute scripts on it.

Related

JOLT shift transformation: collect the items from all the levels without knowing how many levels into a list in one spec

I am trying to transform a JSON using Jolt transformation looking for some input here.
I am trying to get inner items from all the levels into an one array.
My goal is to get an array that contains a part of the items without knowing how many levels I have in the json into a list in one spen.
Here is my input and expected output:
Input:
{
"id": 1,
"item": [
{
"id": "1_1",
"foo": {
"id": 1232,
"nn": "sdfsd"
}
}
]
}
Expected output: (list)
{
"type" : [ "sdfsd" ]
}
My jolt spec:
[
{
"operation": "shift",
"spec": {
"item": {
"*": {
"item": {
"*": {
"item": {
"*": {
"foo": {
"nn": "type"
}
}
},
"foo": {
"nn": "type"
}
}
},
"foo": {
"nn": "type"
}
}
}
}
}
]
My output:
{
"type" : "sdfsd"
}
In case I've multiple items in the input I got a list, but if I've only one item not.
Do you know how should I convert it to array anyway?
But I need to do it only in a one spec - its possible?
Just suffix each type literals, which represent keys, on the right hand side by square brackets, eg.converting them to type[] makes generating array results such as
{
"type" : [ "sdfsd" ]
}
even if there were multiple objects composing item array, the resulting form is kept as the same and the result would yield such alike results
{
"type" : [ "sdfsd", "dfsds", "fjghi", ... ]
}
depending on the input values

Adding, updating and erasing elements from nested json

I'm currently working on a diagram / tree graph generator, to achieve this I'm using two libraries: GraphView to generate the graph and ZoomLayout to move around the view. The main idea of this project is to save all JSON's within an AWS database and then load a list of all the created graphs.
Since the GraphView library doesn't have capability to change or add data from the nodes I decided to create a JSON parser in order to notify new changes and redraw the shape of the graph. So far I managed to create a JSON parser that can read the following format.
example.json
{
"name": "A",
"children": [
{
"name": "B",
"children": [
{
"name": "G",
"children": [
{}
]
}
]
},
{
"name": "C",
"children": [
{
"name": "D",
"children": [
{
"name": "E",
"children": [
{}
]
},
{
"name": "F",
"children": [
{}
]
}
]
}
]
}
]
}
The parser uses a class to iterate over all the nodes within the JSON string named Nodes.
Nodes.kt
class Nodes(
var name: String,
val children: MutableList<Nodes>
){
override fun toString(): String {
return "\nName:$name\nChildren:[$children]"
}
fun hasChildren(): Boolean {
return !children.isNullOrEmpty()
}
}
With that JSON, the app generates the following graph:
The problem
Within this section you can enter a new string which will replace the current one in the selected node. This is done by editing the string without any mapping, using the String.replace() method. But this method doesn't allow me to erase or add new nodes to the current JSON string.
To map the JSON properly I decided to make use of GSON and a MutableList. First I set up the MutableList with the data from the current JSON and then I add a new node in front of the clicked node. The issue is that when I try to print the MutableList as a string the app throws an stackoverflow. This also happens if I try to map it to JSON format using GSON.
This the code that I use to replace the JSON.
// Method used to replace the current JSON with a new one by replacing the selected node with new data
private fun replaceJson(oldData: String, newData: String): Graph {
newGraph = Graph()
newStack.clear()
mNodesList.clear()
val gson = Gson()
var mappedNodes: Nodes = gson.fromJson(json, Nodes::class.java)
val mapper = ObjectMapper()
newStack.push(mappedNodes)
while (newStack.isNotEmpty()) {
replaceData(newStack.pop(), oldData, newData)
}
var position = -1
for(element in mNodesList){
if(element.name == currentNode!!.data.toString()){
println("Adding new node to ${mNodesList.indexOf(element)}")
position = mNodesList.indexOf(element)
}
}
mNodesList.add(position + 1, Nodes(newData, mNodesList))
for(node in mNodesList){
println(node.name)
}
//Stackoverflow
// println(mNodesList.toString())
//Stackoverflow
// val newJson = mapper.writerWithDefaultPrettyPrinter().writeValueAsString(mNodesList)
// println("json::: \n $newJson")
json = json.replace(oldData, newData, ignoreCase = false) //WIP Not final
return newGraph
}
// This method replaces some node data with the newly entered data
// this method uses recursivity to load all children and names in order
private fun replaceData(nodes: Nodes, oldData: String, newData: String) {
for (node in nodes.children) {
if (node.hasChildren()) {
if (node.name == oldData) {
mNodesList.add(node)
newGraph.addEdge(Node(nodes.name), Node(newData)) //<--- replaces data
newStack.push(Nodes(newData, node.children))
} else {
mNodesList.add(node)
newGraph.addEdge(Node(nodes.name), Node(node.name))
newStack.push(node)
}
}
}
}
I read some posts where people uses HashMaps but I'm quite lost and I don't think I understand how JSON mapping works.
Summary
I'm looking for a way to add and delete nodes from the string (JSON) provided above, but I don't quite know how to fix what I already have. It's the first time I'm working with JSON and Lists with Kotlin so I would greatly apreciate any information or help, any insights on how to improve or workaround will also be apreciated.
If anyone wants to see the code it's currently public in my GitHub repository.
PD: I tried providing as much information as possible, if the question is still unclear I will try to improve it.
In case anyone is in a similar situation, here's the solution I came up with.
I ended up simplifying the JSON structure I was using since having a nested JSON was giving me so many problems. I decided to link children and parents in another way. This is the current JSON structure:
{
"nodes": [
{
"data": "A",
"parent": "root"
},
{
"data": "B",
"parent": "A"
},
{
"data": "C",
"parent": "A"
},
{
"data": "G",
"parent": "B"
},
{
"data": "D",
"parent": "C"
},
{
"data": "E",
"parent": "D"
},
{
"data": "F",
"parent": "D"
},
{
"data": "H",
"parent": "F"
},
{
"data": "I",
"parent": "H"
},
{
"data": "J",
"parent": "I"
},
{
"data": "K",
"parent": "J"
}
]
}
I also remade my Nodes class, and separated it two parts: Nodes.kt and SingleNode.kt.
Now the Nodes class only contains a list of SingleNode, and SingleNode contains the data of the node and its parent.
/**
* This class represets all the nodes
* #param nodes represents a list of all the existing nodes
*/
class Nodes(var nodes: List<SingleNode>)
/**
* This class represents the instance of a single node
* #param data name of the node
* #param name of its parent or upper node
*/
class SingleNode(var data: String, var parent: String)
Once I had those classes, I used the GSON library to map the JSON string into a Nodes object.
val tree: Nodes = gson.fromJson(json, Nodes::class.java)
With this structure I was able to map the nodes into a LinkedHashMap, which I can then use to add, remove or edit any key and value (which represent the name of the node and the parent).
By using a mutableListOf<SingleNode> and GSON I can then recreate a JSON based on the previously modified HashMap.

Find all objects in list that contain a certain field

I'm trying to find all the objects in a list of objects that contain a particular field name. For example
"list": [
{
"namesArray": [],
"name": "Bob",
"id": "12345",
},
{
"namesArray": [
"Jenny"
],
"name": "Ned",
},
{
"namesArray": [],
"name": "Jane",
"id": "gkggglg",
}
]
The class looks like this:
class ListItem {
String id;
String name;
List<String> namesArray;
}
So basically I need to find all the objects that contain the field "id". Something like:
list.stream().filter(li -> li.equals("id")).collect(Collectors.toList());
I've tried following this page and it isn't quite what I want. I don't care about the values of the id's, just whether or not the object has the field at all.
From the comments, we get your actual requirement:
So all objects with a non-null id field.
It's easy to adapt the code you've already got using streams and a filter - you just need to change the predicate that's being passed to the filter method. That predicate needs to return true for any value you want to be in the result, and false for any value you want to be discarded. So all you need is:
var result = list
.stream()
.filter(item -> item.id != null)
.collect(Collectors.toList());

Changing value of specific key changes the value of other keys as well

This is how I generate dutyList.
val dutyList = ArrayList<Triple<Int, String, ArrayList<Pair<String, String>>>>()
val dateShiftPair = ArrayList<Pair<String, String>>()
dateList.forEach {date ->
dateShiftPair.add(Pair(date, "E"))
}
staffList.forEach {staff ->
dutyList.add(Triple(list.indexOf(staff), staff.name!!, dateShiftPair))
}
And this should change the second value of the pair
override fun onShiftChange(pos: Int, datePos: Int, shift: String) {
val pair = Pair(staffList[pos].third[datePos].first, shift)
staffListUpdated[pos].third[datePos] = pair
}
but instead it changes other values in pos, that is if I change staffListUpdated[0].third[0] = pair it changes staffListUpdated[1].third[0] = pair as well. I tried many ways but nothing helped.
[
{
"first": 0,
"second": "Ralph",
"third": [
{
"first": "3/5",
"second": "G" //change should happen here only
},
{
"first": "4/5",
"second": "E"
},
{
"first": "6/5",
"second": "E"
}
]
},
{
"first": 1,
"second": "Mike",
"third": [
{
"first": "3/5",
"second": "G" //but change happens here as well.
},
{
"first": "4/5",
"second": "E"
},
{
"first": "5/5",
"second": "E"
}
]
}
]
In
staffList.forEach {staff ->
dutyList.add(Triple(list.indexOf(staff), staff.name!!, dateShiftPair))
}
you use for every Triple the same list instance.
This means if you get the list of one of your Triples and change something you change it for every Triple.
A solution is to copy the list either on insertion or on modification.
You also need to do a deep copy of the list. Because if you make a shallow copy of the list the instances of the Pairs are still the same and if you then change one of them you change it (again) for every Triple.

How do I check if a MongoDB object exists and create/update respectively?

I am developing a wireless network survey tool built with Java (Swing GUI) and a MongoDB data storage solution. I am new to MongoDB and hardly a Java guru so I need some help. I want to find if a network exists in my database and append heard points to the network document. If the network doesn't exist, I would like to create a document for that network and add the heard points. I have been trying to fix this for days but I just can't seem to wrap my head around the solution. Also, it would be nice if the BSSID was the unique id so I don't get any duplicate networks. My ideal data structure would look something like this:
{ 'bssid' : 'ca:fe:de:ad:be:ef',
'channel' : 6,
'heardpoints' : {
'point' : { 'lat' : 36.12345, 'long' : -75.234564 },
'point' : { 'lat' : 36.34567, 'long' : -75.345678 }
}
This is what I have tried so far. It seems to add the initial point but it does not add additional points after the first one was made.
BasicDBObject query = new BasicDBObject();
query.put("bssid", pkt[1]);
DBCursor cursor = coll.find(query);
if (!cursor.hasNext()) {
// Document doesnt exist so create one
BasicDBObject document = new BasicDBObject();
document.put("bssid", pkt[1]);
BasicDBObject heardpoints = new BasicDBObject();
BasicDBObject point = new BasicDBObject();
point.put("lat", latitude);
point.put("long", longitude);
heardpoints.put("point", point);
document.put("heardpoints", heardpoints);
coll.insert(document);
} else {
// Document exists so we will update here
DBObject network = cursor.next();
BasicDBObject heardpoints = new BasicDBObject();
BasicDBObject point = new BasicDBObject();
point.put("lat", latitude);
point.put("long", longitude);
heardpoints.put("point", point);
network.put("heardpoints", heardpoints);
coll.save(network);
}
I feel like I am way off the reservation on this one. Any support would help, thanks a lot!
UPDATE
I am using the upsert suggestion but still having some issue. No doubt this will work for me, I am just not doing it correctly. I am still not getting any new points past the first one added.
BasicDBObject query = new BasicDBObject("bssid", pkt[1]);
System.out.println(query);
DBCursor cursor = coll.find(query);
System.out.println(cursor);
try {
DBObject network = cursor.next();
System.out.println(network);
network.put("heardpoints", new BasicDBObject("point",
new BasicDBObject("lat", latitude)
.append("long", longitude)));
coll.update(query, network, true, false);
} catch (NoSuchElementException ex) {
System.err.println("mongo error");
} finally {
cursor.close();
}
You've got two ways to address this really, it just depends on how you actually want to use the data. In either case the first thing to address is your "ideal data structure", and mostly because it is invalid. This is the wrong part:
'heardpoints' : {
'point' : { 'lat' : 36.12345, 'long' : -75.234564 },
'point' : { 'lat' : 36.34567, 'long' : -75.345678 }
}
So this "hash/map" is invalid because you have the same "key" named twice. You cannot do that and you probably want and "array" instead, as well as something that you have a hope of using GeoSpatial queries on later when you want to:
Array Approach
"heardpoints": [
{
"geometry": {
"type": "Point",
"coordinates": [-75.234564, 36.12345 ]
},
"time": ISODate("2014-11-04T21:09:18.437Z")
},
{
"geometry": {
"type": "Point",
"coordinates": [ -75.345678, 36.34567 ]
},
"time": ISODate("2014-11-04T21:10:28.919Z")
}
]
And a correct ordering for "lon" and "lat" as how MongoDB and the GeoJSON spec it follows does it.
Now this is for the form where you are going to keep all of your "hearddata" in a "single document" per "bssid" value, with each location kept in an array. Note that this is not really necessarily and "upsert" per se, except in the first creation instance. The main intent is to "update" the same "bssid" value document. Just in shell form now with a Java syntax translation later:
db.collection.update(
{ "bssid": "ca:fe:de:ad:be:ef" },
{
"$setOnInsert": { "channel": 6 },
"$push": {
"heardpoints": {
"$each": [{
"geometry": {
"type": "Point",
"coordinates": [-75.234564, 36.12345 ]
},
"time": ISODate("2014-11-04T21:09:18.437Z")
}],
"$sort": { "time": -1 },
"$slice": 20
}
}
},
{ "upsert": true }
);
Whatever the language and API representation, there are basically two parts to a MongoDB update operation. Essentially this:
[ < Query >, < Update > ]
Depending on the API presentation there are technically "three" parts where the third is Options but on the basic consideration on the "upsert" option, it is important to understand how both the Query and Update document portions are handled in an update operation.
The most important thing to apply to the Update document is that it has two forms. If you just supply "keys" and "values" in a standard object form then whatever is supplied will "overwrite" any existing content in a matched document. The other form (which will be used in all examples) is to use "update operators" which allow "parts" of the document to be modified or "augmented". That is important distinction. But on with the examples.
On a blank collection or at least one where the specified "bssid" value does not exist, then a new document would be created containing that "bssid" field value. Additionally there is some other behavior that is going to happen.
There is a special "update operator" in here called $setOnInsert. Just like the conditions specified in the Query portion of the statement, any fields and values mentioned here are only "created" in the document when a "new" document is inserted. So if the document matching the query condition was found then none of the operations here are actually performed to change the found document. This is a good place to set initial values and also limit the write activity on the document to just the fields where it is required.
The second section in the Update document is another "update operator" called $push. As expected by the common term in computing languages, this "adds items" to an "array". So on document creation then a new array is made and the items are appended or otherwise added to the "existing" array content in the found document.
There are some interesting modifiers here which have their own purpose. $each is a modifier that allow more than one item to be sent to an operator like $push at a time. We are only using it for a single item, but it's use it generally required with the other two modifiers we are interested in.
The next is $sort which is applied to the array elements present in the document in order to "sort" them by the condition. In this case there is a "time" field on the array elements, so the "sort" makes sure that as new elements are added then the contents of the array is always ordered so that the "newest" entries are always at the front of the array.
The final there is $slice which is complementing $sort by essentially specifying a "capped amount" for the array. So just to make sure out documents never get too large, the $slice modifier, which would be applied "after" the $sort modifier has done it's work then "removes" any entries beyond the specified "maximum" entries, and maintains the "maximum" length at that number. So quite a useful feature.
Of course if you did not care about a "time" value then there is another way to handle this so that the "coordinate" data is only kept for "unique" combinations. That way is to use the $addToSet operator to manage array or "set" entries by itself:
db.collection.update(
{ "bssid": "ca:fe:de:ad:be:ef" },
{
"$setOnInsert": { "channel": 6 },
"$addToSet": {
"heardpoints": {
"$each": [{
"geometry": {
"type": "Point",
"coordinates": [-75.234564, 36.12345 ]
}
}]
}
}
},
{ "upsert": true }
);
Now that does not actually need the $each modifier, but it's just left there for a future point. $addToSet essentially looks at the existing array content and compares it do the element you have supplied. Where that data does not exactly match something already present in the array then it is added to the "set". Otherwise, nothing happens since the data is already there.
So if you just want the data collected for specific points where they vary then this is a good approach. But there is a "catch", and a couple actually that are worth mentioning.
Suppose you want to keep only 20 entries as was mentioned before. While $addToSet supports the $each modifier, unfortunately the other modifiers such as $slice are not supported. So you cant "maintain a cap" with a single update statement and you would in fact have to issue "two" update operations in order to achieve this:
db.collection.update(
{ "bssid": "ca:fe:de:ad:be:ef" },
{
"$setOnInsert": { "channel": 6 },
"$addToSet": {
"heardpoints": {
"$each": [{
"geometry": {
"type": "Point",
"coordinates": [-75.234564, 36.12345 ]
}
}]
}
}
},
{ "upsert": true }
);
db.collection.update(
{ "bssid": "ca:fe:de:ad:be:ef" },
{
"$setOnInsert": { "channel": 6 },
"$push": {
"heardpoints": {
"$each": [],
"$slice": 20
}
}
}
)
But even so we have a new problem here. Aside from now counting in "two" operations, keeping this cap has another problem, which basically is that a "set" is "not ordered" in any way. So you can limit the total number of items in the list with the second update, but there is no way to remove the "oldest" item for example.
In order to do this then you want a "time" field for the "last update", but yes there is a catch again. Once you supply a "time" value then the "distinct data" that makes a "set" is no longer true. An $addToSet operation considers the following to be two "different" entries as all fields and not just the "coordinate" data is considered:
"heardpoints": [
{
"geometry": {
"type": "Point",
"coordinates": [-75.234564, 36.12345 ]
},
"time": ISODate("2014-11-04T21:09:18.437Z")
},
{
"geometry": {
"type": "Point",
"coordinates": [-75.234564, 36.12345 ]
},
"time": ISODate("2014-11-04T21:10:28.919Z")
}
]
Where the intent is to just "update the time" on the existing point at the given coordinates, then you need to take a different approach. But again this is two updates and in reverse, you try to update a document first and then do something else if that does not succeed. Meaning the "upsert" attempt is the second operation:
var result = db.collection.update(
{
"bssid": "ca:fe:de:ad:be:ef",
"heardpoints.geometry.coordinates": [-75.234564, 36.12345 ]
},
{
"$set": {
"heardpoints.$.time": ISODate("2014-11-04T21:10:28.919Z")
}
}
);
// If result did not match and modify anything existing then perform the upsert
if ( ) {
db.collection.update(
{ "bssid": "ca:fe:de:ad:be:ef" }, // just this key and not the array
{
"$setOnInsert": { "channel": 6 },
"$push": {
"heardpoints": {
"$each": [{
"geometry": {
"type": "Point",
"coordinates": [-75.234564, 36.12345 ]
},
"time": ISODate("2014-11-04T21:09:18.437Z")
}],
"$sort": { "time": -1 },
"$slice": 20
}
}
},
{ "upsert": true }
);
}
So two sepations where one tries to "update" an existing array entry by first querying for that position. That first operation cannot be an upsert since it would create a new document with the same "bssid" and the array entry that was not found. If it could that would be, but this is not allowed with the positional $ operator which is using a matched position of the found element so that that element can be altered via the $set operator.
In the Java invocation there is a WriteResult type that is returned which can be used like this:
WriteResult writeResult = collection.update(query1, update1, false, false);
if ( writeResult.getN() == 0 ) {
// Upsert would be tried if the array item was not found
writeResult = collection.update(query2, update2, true, false);
}
If something was not updated then the serialized content looks like this:
{ "serverUsed" : "192.168.2.3:27017" , "ok" : 1 , "n" : 0 , "updatedExisting" : true}
Which means you basically nest the n value to see what happened and make your decision on whether to "update" the array item or "push" a new one depending on where the query matched that array item or not.
Document Approach
The general conclusion from the above is that where you want to keep distinct data for the "coordinates" and just modify a "time" entry then the above process can get messy. The operations are not ideally atomic, and though there can be some tuning, it is probably not well suited to high volume updates.
This is a case then where the logic is to "remove" the array storage, and then store each distinct "point" in it's own document with the related "bssid" field. This simplifies the case of whether to update or "insert" a new one into a single operation model. Documents in the collection now look like this:
{
"bssid": "ca:fe:de:ad:be:ef",
"channel": 6,
"geometry": {
"type": "Point",
"coordinates": [-75.234564, 36.12345 ]
},
"time": ISODate("2014-11-04T21:09:18.437Z")
},
{
"bssid": "ca:fe:de:ad:be:ef",
"channel": 6,
"geometry": {
"type": "Point",
"coordinates": [ -75.345678, 36.34567 ]
},
"time": ISODate("2014-11-04T21:10:28.919Z")
}
Distinct in their own collection and not bound in the same document under an array. There is data duplication but the "update" process is now much simplified:
db.collection.update(
{
"bssid": "ca:fe:de:ad:be:ef",
"geometry": {
"type": "Point",
"coordinates": [-75.234564, 36.12345 ]
}
},
{
"$setOnInsert": { "channel": 6 },
"$set": { "time": ISODate("2014-11-04T21:10:28.919Z") }
}
{ "upsert": true }
)
And all that does would be match a document based on the supplied "bssid" and "point" values either "updating" the "time" where it matched or just inserting a new document with all values where that "bssid" and "point" data was not found.
The overall case is that where this started off with simple needs and it was fine to "embed" the array into the array, maintaining more complex needs can be a possible pain to use that storage form. On the other hand, using separate documents in the collection has it's benefits on one side, but then you do have to do your own work to "clean up" entries beyond any cap limits you might want. But it is arguable that may not necessarily need to be a "real time" operation.
Different approaches, so work with the one that suits you best. This is just a guide to implement in either way and showing the pitfalls and solutions. What works best for you, only you can tell.
This really is more about the technique than the specific Java coding. That part is not hard, so here is just some of the most difficult structure from above for reference:
DBObject update = new BasicDBObject(
"$setOnInsert", new BasicDBObject(
"channel", 6
)
).append(
"$push", new BasicDBObject(
"heardpoints", new BasicDBObject(
"$each", new DBObject[]{
new BasicDBObject(
"geometry",
new BasicDBObject("type","Point").append(
"coordinates", new double[]{-75.234564, 36.12345}
)
).append(
"time", new DateTime(2014,1,1,0,0,DateTimeZone.UTC).toDate()
)
}
).append(
"$sort", new BasicDBObject(
"time", -1
)
).append("$slice", 20)
)
);

Categories