I'll try to reduce my case to the necessary: I'm building a Webapp (with Spring) with a search interface that lets you search a corpus of annotated/tagged texts. In my DB (MongoDB) one document represents one page of a book collection (totaling ~8000 pages).
Here is an example of the Document structure in JSON (I removed a lot of meta data for brevity. Also, and this is important, the "tokens"-array contains up to 700 objects in most cases.):
{
"_id" : ObjectId("5622c29eef86d3c2f23fd62c"),
"scanId" : "592ea208b6d108ee5ae63f79",
"volume" : "Volume I",
"chapters" : [
"Some Chapter Name"
],
"languages" : [
"English",
"German"
],
"tokens" : [
{
"form" : "The",
"index" : 0,
"tags" : [
"ART"
]
},
{
"form" : "house",
"index" : 1,
"tags" : [
"NN",
"NN_P"
]
},
{
"form" : "is",
"index" : 2,
"tags" : [
"V",
"CONJ_C"
]
}
]
}
So you see i don't have a plain text, here. I now want to build an index with Lucene to quickly search this DB. The problem is that i want to be able to search certain words, their tags AND the context around it. Like "give me all documents containing the word 'House' tagged as 'NN' followed by a word tagged with 'V'.". I couldn't find a way to index these sub-structures with native Lucene functionality.
What i tried to do to at least be able to search for words and their tags is the following: In my Lucene index, a document doesn't represent a whole page, but only a word/token with it's tags. So one index document looks like this (expressed in JSON syntax for readability):
{
"token" : "house",
"tag" : "NN",
"tag" : "NN_P",
"index" : 1,
"pageId" : "5622c29eef86d3c2f23fd62c"
}
... Yes, Lucene allows me to use one field multiple times. So now i can search for a word and it's tags and get a reference to the page object in my DB via it's ID. But this is pretty ugly for two reasons: I now have two completely different document representations (DB and Lucene index) and to process a complex query like the one i mentioned above i'd have to query for the word and it's tag and then further check the context of the hits in the retrieved documents manually.
So my question is: Is there a way to index documents in Lucene containing fields/properties whose values are nested objects that in turn have certain properties?
Is there a way to index documents in Lucene containing fields/properties whose values are nested objects that in turn have certain properties?
Elasticsearch certainly lets you do this. I think it's possible to do all of it in pure lucene, but may be some effort.
Basically, you need to use the 'nested' query: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-nested-query.html
PUT /my_index
{
"mappings": {
"type1" : {
"properties" : {
"tokens" : {
"type" : "nested"
}
}
}
}
}
This tells ES to index the contents of this field as a list of separate documents, allowing you to query them individually using the 'nested' query:
GET my_index/_search
{
"query": {
"nested": {
"path": "tokens",
"query": {
"bool": {
"must": [
{ "match": { "tokens.form": "house" }},
{ "match": { "tokens.tags": "NN" }}
]
}
}
}
}
}
I've got an XML configuration mapped to a JSON document which has an array of elements, but when there is only one element, the document looks like this:
{
"name" : "test2"
"products" : {
"id" : "prod3"
"value" : "prod_value3"
}
}
{
"name" : "test1"
"products" : [
{
"id" : "prod1"
"value" : "prod_value1"
},
{
"id" : "prod2"
"value" : "prod_value2"
}
]
}
Instead of an array of elements, there is only one element "products"
The JSON is inserted into the MongoDB database and I'm trying to map the "products" as an ArrayList but in the first example, the array returns empty.
My question is: Is there any way to automatically map this case with Java? Maybe a customMapper?
This case in Java is known as overloading methods. Object of some class and array are different types. You can't use one typecast to another etc., but you can use different type of parameter in the method accepting the value.
I am developing a wireless network survey tool built with Java (Swing GUI) and a MongoDB data storage solution. I am new to MongoDB and hardly a Java guru so I need some help. I want to find if a network exists in my database and append heard points to the network document. If the network doesn't exist, I would like to create a document for that network and add the heard points. I have been trying to fix this for days but I just can't seem to wrap my head around the solution. Also, it would be nice if the BSSID was the unique id so I don't get any duplicate networks. My ideal data structure would look something like this:
{ 'bssid' : 'ca:fe:de:ad:be:ef',
'channel' : 6,
'heardpoints' : {
'point' : { 'lat' : 36.12345, 'long' : -75.234564 },
'point' : { 'lat' : 36.34567, 'long' : -75.345678 }
}
This is what I have tried so far. It seems to add the initial point but it does not add additional points after the first one was made.
BasicDBObject query = new BasicDBObject();
query.put("bssid", pkt[1]);
DBCursor cursor = coll.find(query);
if (!cursor.hasNext()) {
// Document doesnt exist so create one
BasicDBObject document = new BasicDBObject();
document.put("bssid", pkt[1]);
BasicDBObject heardpoints = new BasicDBObject();
BasicDBObject point = new BasicDBObject();
point.put("lat", latitude);
point.put("long", longitude);
heardpoints.put("point", point);
document.put("heardpoints", heardpoints);
coll.insert(document);
} else {
// Document exists so we will update here
DBObject network = cursor.next();
BasicDBObject heardpoints = new BasicDBObject();
BasicDBObject point = new BasicDBObject();
point.put("lat", latitude);
point.put("long", longitude);
heardpoints.put("point", point);
network.put("heardpoints", heardpoints);
coll.save(network);
}
I feel like I am way off the reservation on this one. Any support would help, thanks a lot!
UPDATE
I am using the upsert suggestion but still having some issue. No doubt this will work for me, I am just not doing it correctly. I am still not getting any new points past the first one added.
BasicDBObject query = new BasicDBObject("bssid", pkt[1]);
System.out.println(query);
DBCursor cursor = coll.find(query);
System.out.println(cursor);
try {
DBObject network = cursor.next();
System.out.println(network);
network.put("heardpoints", new BasicDBObject("point",
new BasicDBObject("lat", latitude)
.append("long", longitude)));
coll.update(query, network, true, false);
} catch (NoSuchElementException ex) {
System.err.println("mongo error");
} finally {
cursor.close();
}
You've got two ways to address this really, it just depends on how you actually want to use the data. In either case the first thing to address is your "ideal data structure", and mostly because it is invalid. This is the wrong part:
'heardpoints' : {
'point' : { 'lat' : 36.12345, 'long' : -75.234564 },
'point' : { 'lat' : 36.34567, 'long' : -75.345678 }
}
So this "hash/map" is invalid because you have the same "key" named twice. You cannot do that and you probably want and "array" instead, as well as something that you have a hope of using GeoSpatial queries on later when you want to:
Array Approach
"heardpoints": [
{
"geometry": {
"type": "Point",
"coordinates": [-75.234564, 36.12345 ]
},
"time": ISODate("2014-11-04T21:09:18.437Z")
},
{
"geometry": {
"type": "Point",
"coordinates": [ -75.345678, 36.34567 ]
},
"time": ISODate("2014-11-04T21:10:28.919Z")
}
]
And a correct ordering for "lon" and "lat" as how MongoDB and the GeoJSON spec it follows does it.
Now this is for the form where you are going to keep all of your "hearddata" in a "single document" per "bssid" value, with each location kept in an array. Note that this is not really necessarily and "upsert" per se, except in the first creation instance. The main intent is to "update" the same "bssid" value document. Just in shell form now with a Java syntax translation later:
db.collection.update(
{ "bssid": "ca:fe:de:ad:be:ef" },
{
"$setOnInsert": { "channel": 6 },
"$push": {
"heardpoints": {
"$each": [{
"geometry": {
"type": "Point",
"coordinates": [-75.234564, 36.12345 ]
},
"time": ISODate("2014-11-04T21:09:18.437Z")
}],
"$sort": { "time": -1 },
"$slice": 20
}
}
},
{ "upsert": true }
);
Whatever the language and API representation, there are basically two parts to a MongoDB update operation. Essentially this:
[ < Query >, < Update > ]
Depending on the API presentation there are technically "three" parts where the third is Options but on the basic consideration on the "upsert" option, it is important to understand how both the Query and Update document portions are handled in an update operation.
The most important thing to apply to the Update document is that it has two forms. If you just supply "keys" and "values" in a standard object form then whatever is supplied will "overwrite" any existing content in a matched document. The other form (which will be used in all examples) is to use "update operators" which allow "parts" of the document to be modified or "augmented". That is important distinction. But on with the examples.
On a blank collection or at least one where the specified "bssid" value does not exist, then a new document would be created containing that "bssid" field value. Additionally there is some other behavior that is going to happen.
There is a special "update operator" in here called $setOnInsert. Just like the conditions specified in the Query portion of the statement, any fields and values mentioned here are only "created" in the document when a "new" document is inserted. So if the document matching the query condition was found then none of the operations here are actually performed to change the found document. This is a good place to set initial values and also limit the write activity on the document to just the fields where it is required.
The second section in the Update document is another "update operator" called $push. As expected by the common term in computing languages, this "adds items" to an "array". So on document creation then a new array is made and the items are appended or otherwise added to the "existing" array content in the found document.
There are some interesting modifiers here which have their own purpose. $each is a modifier that allow more than one item to be sent to an operator like $push at a time. We are only using it for a single item, but it's use it generally required with the other two modifiers we are interested in.
The next is $sort which is applied to the array elements present in the document in order to "sort" them by the condition. In this case there is a "time" field on the array elements, so the "sort" makes sure that as new elements are added then the contents of the array is always ordered so that the "newest" entries are always at the front of the array.
The final there is $slice which is complementing $sort by essentially specifying a "capped amount" for the array. So just to make sure out documents never get too large, the $slice modifier, which would be applied "after" the $sort modifier has done it's work then "removes" any entries beyond the specified "maximum" entries, and maintains the "maximum" length at that number. So quite a useful feature.
Of course if you did not care about a "time" value then there is another way to handle this so that the "coordinate" data is only kept for "unique" combinations. That way is to use the $addToSet operator to manage array or "set" entries by itself:
db.collection.update(
{ "bssid": "ca:fe:de:ad:be:ef" },
{
"$setOnInsert": { "channel": 6 },
"$addToSet": {
"heardpoints": {
"$each": [{
"geometry": {
"type": "Point",
"coordinates": [-75.234564, 36.12345 ]
}
}]
}
}
},
{ "upsert": true }
);
Now that does not actually need the $each modifier, but it's just left there for a future point. $addToSet essentially looks at the existing array content and compares it do the element you have supplied. Where that data does not exactly match something already present in the array then it is added to the "set". Otherwise, nothing happens since the data is already there.
So if you just want the data collected for specific points where they vary then this is a good approach. But there is a "catch", and a couple actually that are worth mentioning.
Suppose you want to keep only 20 entries as was mentioned before. While $addToSet supports the $each modifier, unfortunately the other modifiers such as $slice are not supported. So you cant "maintain a cap" with a single update statement and you would in fact have to issue "two" update operations in order to achieve this:
db.collection.update(
{ "bssid": "ca:fe:de:ad:be:ef" },
{
"$setOnInsert": { "channel": 6 },
"$addToSet": {
"heardpoints": {
"$each": [{
"geometry": {
"type": "Point",
"coordinates": [-75.234564, 36.12345 ]
}
}]
}
}
},
{ "upsert": true }
);
db.collection.update(
{ "bssid": "ca:fe:de:ad:be:ef" },
{
"$setOnInsert": { "channel": 6 },
"$push": {
"heardpoints": {
"$each": [],
"$slice": 20
}
}
}
)
But even so we have a new problem here. Aside from now counting in "two" operations, keeping this cap has another problem, which basically is that a "set" is "not ordered" in any way. So you can limit the total number of items in the list with the second update, but there is no way to remove the "oldest" item for example.
In order to do this then you want a "time" field for the "last update", but yes there is a catch again. Once you supply a "time" value then the "distinct data" that makes a "set" is no longer true. An $addToSet operation considers the following to be two "different" entries as all fields and not just the "coordinate" data is considered:
"heardpoints": [
{
"geometry": {
"type": "Point",
"coordinates": [-75.234564, 36.12345 ]
},
"time": ISODate("2014-11-04T21:09:18.437Z")
},
{
"geometry": {
"type": "Point",
"coordinates": [-75.234564, 36.12345 ]
},
"time": ISODate("2014-11-04T21:10:28.919Z")
}
]
Where the intent is to just "update the time" on the existing point at the given coordinates, then you need to take a different approach. But again this is two updates and in reverse, you try to update a document first and then do something else if that does not succeed. Meaning the "upsert" attempt is the second operation:
var result = db.collection.update(
{
"bssid": "ca:fe:de:ad:be:ef",
"heardpoints.geometry.coordinates": [-75.234564, 36.12345 ]
},
{
"$set": {
"heardpoints.$.time": ISODate("2014-11-04T21:10:28.919Z")
}
}
);
// If result did not match and modify anything existing then perform the upsert
if ( ) {
db.collection.update(
{ "bssid": "ca:fe:de:ad:be:ef" }, // just this key and not the array
{
"$setOnInsert": { "channel": 6 },
"$push": {
"heardpoints": {
"$each": [{
"geometry": {
"type": "Point",
"coordinates": [-75.234564, 36.12345 ]
},
"time": ISODate("2014-11-04T21:09:18.437Z")
}],
"$sort": { "time": -1 },
"$slice": 20
}
}
},
{ "upsert": true }
);
}
So two sepations where one tries to "update" an existing array entry by first querying for that position. That first operation cannot be an upsert since it would create a new document with the same "bssid" and the array entry that was not found. If it could that would be, but this is not allowed with the positional $ operator which is using a matched position of the found element so that that element can be altered via the $set operator.
In the Java invocation there is a WriteResult type that is returned which can be used like this:
WriteResult writeResult = collection.update(query1, update1, false, false);
if ( writeResult.getN() == 0 ) {
// Upsert would be tried if the array item was not found
writeResult = collection.update(query2, update2, true, false);
}
If something was not updated then the serialized content looks like this:
{ "serverUsed" : "192.168.2.3:27017" , "ok" : 1 , "n" : 0 , "updatedExisting" : true}
Which means you basically nest the n value to see what happened and make your decision on whether to "update" the array item or "push" a new one depending on where the query matched that array item or not.
Document Approach
The general conclusion from the above is that where you want to keep distinct data for the "coordinates" and just modify a "time" entry then the above process can get messy. The operations are not ideally atomic, and though there can be some tuning, it is probably not well suited to high volume updates.
This is a case then where the logic is to "remove" the array storage, and then store each distinct "point" in it's own document with the related "bssid" field. This simplifies the case of whether to update or "insert" a new one into a single operation model. Documents in the collection now look like this:
{
"bssid": "ca:fe:de:ad:be:ef",
"channel": 6,
"geometry": {
"type": "Point",
"coordinates": [-75.234564, 36.12345 ]
},
"time": ISODate("2014-11-04T21:09:18.437Z")
},
{
"bssid": "ca:fe:de:ad:be:ef",
"channel": 6,
"geometry": {
"type": "Point",
"coordinates": [ -75.345678, 36.34567 ]
},
"time": ISODate("2014-11-04T21:10:28.919Z")
}
Distinct in their own collection and not bound in the same document under an array. There is data duplication but the "update" process is now much simplified:
db.collection.update(
{
"bssid": "ca:fe:de:ad:be:ef",
"geometry": {
"type": "Point",
"coordinates": [-75.234564, 36.12345 ]
}
},
{
"$setOnInsert": { "channel": 6 },
"$set": { "time": ISODate("2014-11-04T21:10:28.919Z") }
}
{ "upsert": true }
)
And all that does would be match a document based on the supplied "bssid" and "point" values either "updating" the "time" where it matched or just inserting a new document with all values where that "bssid" and "point" data was not found.
The overall case is that where this started off with simple needs and it was fine to "embed" the array into the array, maintaining more complex needs can be a possible pain to use that storage form. On the other hand, using separate documents in the collection has it's benefits on one side, but then you do have to do your own work to "clean up" entries beyond any cap limits you might want. But it is arguable that may not necessarily need to be a "real time" operation.
Different approaches, so work with the one that suits you best. This is just a guide to implement in either way and showing the pitfalls and solutions. What works best for you, only you can tell.
This really is more about the technique than the specific Java coding. That part is not hard, so here is just some of the most difficult structure from above for reference:
DBObject update = new BasicDBObject(
"$setOnInsert", new BasicDBObject(
"channel", 6
)
).append(
"$push", new BasicDBObject(
"heardpoints", new BasicDBObject(
"$each", new DBObject[]{
new BasicDBObject(
"geometry",
new BasicDBObject("type","Point").append(
"coordinates", new double[]{-75.234564, 36.12345}
)
).append(
"time", new DateTime(2014,1,1,0,0,DateTimeZone.UTC).toDate()
)
}
).append(
"$sort", new BasicDBObject(
"time", -1
)
).append("$slice", 20)
)
);
I have an Android game in which I want to store levels as a static Java class.
What is the equivalent of the following Javascript object in Java?
var Levels = {
Level1:{
shapes:[
{
bodytype : "dynamic",
h : "50.0000",
nameid : "hofN7-1",
props : {
id : "properties"
},
rotation : "0.0000",
type : "square",
uid : "Av2EZQh",
w : "50.0000",
x : "20.0000",
y : "20.0000"
},
{
bodytype : "dynamic",
h : "50.0000",
nameid : "hofN7-2",
props : {
gravMassScale : "2",
id : "properties",
inertia : "2",
isBullet : true,
torque : "2",
velocity : {
x : "2",
y : "2"
}
}
}
...
If you are looking for something close to how JS stores objects, you can try JSONObject class:
http://json.org/javadoc/org/json/JSONObject.html
Keeping Syntax-to-Syntax mapping aside, from an engineering perspective, I would store levels in a map of Levels like so:
HashMap<String, Level> myLevels = new HashMap<String, Level>();
As the commenters wrote, you can use a Map to create an equivalent object in Java. However, the values in this map are of different types. Some would be Strings. Some would be arrays of some type of Object. Some would be Maps themselves.
The only collection object you could use then is Map<String, Object>, and then you would need to make instanceof checks each time you used it. This would be clumsy.
You could use a JSON binding library, like Jackson. See http://wiki.fasterxml.com/JacksonHome. This has a mode where you create a class, Levels, and annotates its instance variables so Jackson can turn instances into JSON and back. But you need to know your class structure for that. Jackson has a mode where it can use untyped maps.
Finally, you can model Levels by treating it as a tree and using the Composite pattern. You'd have a Node abstract class that holds a name, and concrete subclasses like IntegerNode, StringNode, DoubleNode, MapNode, and ArrayNode.
Take this adjusted version of your JSON and create the classes using jsonschema2pojo:
{
"Levels":{
"shapes":[
{
"bodytype" : "dynamic",
"h" : "50.0000",
"nameid" : "hofN7-1",
"props" : {
"id" : "properties"
},
"rotation" : "0.0000",
"type" : "square",
"uid" : "Av2EZQh",
"w" : "50.0000",
"x" : "20.0000",
"y" : "20.0000"
},
{
"bodytype" : "dynamic",
"h" : "50.0000",
"nameid" : "hofN7-2",
"props" : {
"gravMassScale" : "2",
"id" : "properties",
"inertia" : "2",
"isBullet" : "true",
"torque" : "2",
"velocity" : {
"x" : "2",
"y" : "2"
}
}
}]
}
}
Check JSON for conversion and choose other options as you need them. You can download and use use the converted classes in you Java project.
Hope this helps ... Cheers!