I'm trying to get a list of mongo "_ids" from a database using Java. I don't need any other part of the objects in the database, just the "_id".
This is what I'm doing right now:
// Another method queries for all objects of a certain type within the database.
Collection<MyObject> thingies = this.getMyObjects();
Collection<String> ids = new LinkedList<String>();
for (MyObject thingy : thingies) {
ids.add(thingy.getGuid());
}
This seems horribly inefficient though... is there a way just to query mongo for objects of a certain type and return only their "_ids" without having to reassemble the entire object and extract it?
Thanks!
The find() method has an overload where you can pass the keys that you want to retrieve back from the query or those that you don't want.
So you could try this:
BasicDBObject qyery = new BasicDBObject("someKey","someValue");
BasicDBObject keys = new BasicDBObject("_id", 1);
DBCursor cursor = collection.find(query, keys);
Related
I'd like to imagine there's existing API functionality for this. Suppose there was Java code that looks something like this:
JavaRDD<Integer> queryKeys = ...; //values not particularly important
List<Document> allMatches = db.getCollection("someDB").find(queryKeys); //doesn't work, I'm aware
JavaPairRDD<Integer, Iterator<ObjectContainingKey>> dbQueryResults = ...;
Goal of this: After a bunch of data transformations, I end up with an RDD of integer keys that I'd like to make a single db query with (rather than a bunch of queries) based on this collection of keys.
From there, I'd like to turn the query results into a pair RDD of the key and all of its results in an iterator (making it easy to hit the ground going again for the next steps I'm intending to take). And to clarify, I mean a pair of the key and its results as an iterator.
I know there's functionality in MongoDB capable of coordinating with Spark, but I haven't found anything that'll work with this yet (it seems to lean towards writing to a database rather than querying it).
I managed to figure this out in an efficient enough manner.
JavaRDD<Integer> queryKeys = ...;
JavaRDD<BasicDBObject> queries = queryKeys.map(value -> new BasicDBObject("keyName", value));
BasicDBObject orQuery = SomeHelperClass.buildOrQuery(queries.collect());
List<Document> queryResults = db.getCollection("docs").find(orQuery).into(new ArrayList<>());
JavaRDD<Document> parallelResults = sparkContext.parallelize(queryResults);
JavaRDD<ObjectContainingKey> results = parallelResults.map(doc -> SomeHelperClass.fromJSONtoObj(doc));
JavaPairRDD<Integer, Iterable<ObjectContainingKey>> keyResults = results.groupBy(obj -> obj.getKey());
And the method buildOrQuery here:
public static BasicDBObject buildOrQuery(List<BasicDBObject> queries) {
BasicDBList or = new BasicDBList();
for(BasicDBObject query : queries) {
or.add(query);
}
return new BasicDBObject("$or", or);
}
Note that there's a fromJSONtoObj method that will convert your object back from JSON into all of the required field variables. Also note that obj.getKey() is simply a getter method associated to whatever "key" it is.
I have just started using Mongo Db . Below is my data structure .
It has an array of skillID's , each of which have an array of activeCampaigns and each activeCampaign has an array of callsByTimeZone.
What I am looking for in SQL terms is :
Select activeCampaigns.callsByTimeZone.label,
activeCampaigns.callsByTimeZone.loaded
from X
where skillID=50296 and activeCampaigns.campaign_id= 11371940
and activeCampaigns.callsByTimeZone='PT'
The output what I am expecting is to get
{"label":"PT", "loaded":1 }
The Command I used is
db.cd.find({ "skillID" : 50296 , "activeCampaigns.campaignId" : 11371940,
"activeCampaigns.callsByTimeZone.label" :"PT" },
{ "activeCampaigns.callsByTimeZone.label" : 1 ,
"activeCampaigns.callsByTimeZone.loaded" : 1 ,"_id" : 0})
The output what I am getting is everything under activeCampaigns.callsByTimeZone while I am expecting just for PT
DataStructure :
{
"skillID":50296,
"clientID":7419,
"voiceID":1,
"otherResults":7,
"activeCampaigns":
[{
"campaignId":11371940,
"campaignFileName":"Aaron.name.121.csv",
"loaded":259,
"callsByTimeZone":
[{
"label":"CT",
"loaded":6
},
{
"label":"ET",
"loaded":241
},
{
"label":"PT",
"loaded":1
}]
}]
}
I tried the same in Java.
QueryBuilder query = QueryBuilder.start().and("skillID").is(50296)
.and("activeCampaigns.campaignId").is(11371940)
.and("activeCampaigns.callsByTimeZone.label").is("PT");
BasicDBObject fields = new BasicDBObject("activeCampaigns.callsByTimeZone.label",1)
.append("activeCampaigns.callsByTimeZone.loaded",1).append("_id", 0);
DBCursor cursor = coll.find(query.get(), fields);
String campaignJson = null;
while(cursor.hasNext()) {
DBObject campaignDBO = cursor.next();
campaignJson = campaignDBO.toString();
System.out.println(campaignJson);
}
the value obtained is everything under callsByTimeZone array. I am currently parsing the JSON obtained and getting only PT values . Is there a way to just query the PT fields inside activeCampaigns.callsByTimeZone .
Thanks in advance .Sorry if this question has already been raised in the forum, I have searched a lot and failed to find a proper solution.
Thanks in advance.
There are several ways of doing it, but you should not be using String manipulation (i.e. indexOf), the performance could be horrible.
The results in the cursor are nested Maps, representing the document in the database - a Map is a good Java-representation of key-value pairs. So you can navigate to the place you need in the document, instead of having to parse it as a String. I've tested the following and it works on your test data, but you might need to tweak it if your data is not all exactly like the example:
while (cursor.hasNext()) {
DBObject campaignDBO = cursor.next();
List callsByTimezone = (List) ((DBObject) ((List) campaignDBO.get("activeCampaigns")).get(0)).get("callsByTimeZone");
DBObject valuesThatIWant;
for (Object o : callsByTimezone) {
DBObject call = (DBObject) o;
if (call.get("label").equals("PT")) {
valuesThatIWant = call;
}
}
}
Depending upon your data, you might want to add protection against null values as well.
The thing you were looking for ({"label":"PT", "loaded":1 }) is in the variable valueThatIWant. Note that this, too, is a DBObject, i.e. a Map, so if you want to see what's inside it you need to use get:
valuesThatIWant.get("label"); // will return "PT"
valuesThatIWant.get("loaded"); // will return 1
Because DBObject is effectively a Map of String to Object (i.e. Map<String, Object>) you need to cast the values that come out of it (hence the ugliness in the first bit of code in my answer) - with numbers, it will depend on how the data was loaded into the database, it might come out as an int or as a double:
String theValueOfLabel = (String) valuesThatIWant.get("label"); // will return "PT"
double theValueOfLoaded = (Double) valuesThatIWant.get("loaded"); // will return 1.0
I'd also like to point out the following from my answer:
((List) campaignDBO.get("activeCampaigns")).get(0)
This assumes that "activeCampaigns" is a) a list and in this case b) only has one entry (I'm doing get(0)).
You will also have noticed that the fields values you've set are almost entirely being ignored, and the result is most of the document, not just the fields you asked for. I'm pretty sure you can only define the top-level fields you want the query to return, so your code:
BasicDBObject fields = new BasicDBObject("activeCampaigns.callsByTimeZone.label",1)
.append("activeCampaigns.callsByTimeZone.loaded",1)
.append("_id", 0);
is actually exactly the same as:
BasicDBObject fields = new BasicDBObject("activeCampaigns", 1).append("_id", 0);
I think some of the points that will help you to work with Java & MongoDB are:
When you query the database, it will return you the whole document of
the thing that matches your query, i.e. everything from "skillID"
downwards. If you want to select the fields to return, I think those will only be top-level fields. See the documentation for more detail.
To navigate the results, you need to know that a DBObjects are returned, and that these are effectively a Map<String,
Object> in Java - you can use get to navigate to the correct node,
but you will need to cast the values into the correct shape.
Replacing while loop from your Java code with below seems to give "PT" as output.
`while(cursor.hasNext()) {
DBObject campaignDBO = cursor.next();
campaignJson = campaignDBO.get("activeCampaigns").toString();
int labelInt = campaignJson.indexOf("PT", -1);
String label = campaignJson.substring(labelInt, labelInt+2);
System.out.println(label);
}`
In mySQL the describe statement can be used to retrieve the schema of a given table, unfortunately I could not locate a similar functionality for the MongoDB java driver :(
Let's say I have I have the flowing BSON documents:
{
_id: {$oid:49},
values: { a:10, b:20}
}
,
{
_id: {$oid:50},
values: { b:21, c:31}
}
Now let's suppose I do:
DBObject obj = cursor.next();
DBObject values_1 = (DBObject) obj.get("values");
and the schema should be something like this:
>a : int
>b : int
and for the next document:
DBObject obj = cursor.next();
DBObject values_2 = (DBObject) obj.get("values");
the schema should be:
>b : int
>c : int
Now that I explained what the schema retrial is, can some1 be nice and tell me how to do it?
If it helps, In ma case I only need to know the field names (because the datatype is always the same, but it would be nice also know how to retrieve the datatypes).
A work arround, is convert the DBObject to a Map, then the Map to a Set, the Set to an Iterator and extract the attribute names/values... Still have now idea how to extract data types.
this is:
DBObject values_1 = (DBObject) obj.get("values");
Map _map = values_1.toMap();
// Set set = _map.entrySet(); // if you want the <key, value> pairs
Set _set_keys = _map.keySet();
Iterator _iterator = _set_keys.iterator();
while (_iterator.hasNext())
System.out.println("-> " + _iterator.next());
A DBObject has a method called keySet (documentation). You shouldn't need to convert to a Map first.
There's no exposed method to determine the underlying BSON data type at this point, so you'd need to investigate using instanceof or getClass to determine the underlying data type of the Object that is returned from get.
If you look at the source code for BasicBSONObject for example, you'll see how the helper functions that cast do some basic checks and then force the cast.
I have List of complex object and the complex object contains more than two fields. Something like - {car1, car2, car3} and car has name and type field
Is there a simpler way of inserting a list of car? Something like
DBObject updateObject = new BasicDBObject().append("$push", new BasicDBObject().append("cars", cars)
I tried with $pushAll and it does not seems to be work. I did bit more research and I found it needs information about mapping and that is one of the reasons why this insertion is failing.
What's the best way to do this insertion into MongoDB? Some sample code or direction would be helpful. Please note this has to be done through Java.
Well, if you don't want to translate your car objects to DBObjects manually, there are Mapping Frameworks out there. like Morphia.
Personally, I would just wire up a mapping method manually though. Code could look like this (untested, typos to be expected)
BasicDBObject updateObject = new BasicDBObject();
BasicDBList dbCarList = mapCars(cars);
updateObject.append("$push", new BasicDBObject("cars", dbCarList));
...
private BasicDBList mapCars(List<Car> cars) {
BasicDBList result = new BasicDBList();
for (Car car: cars) {
BasicDBObject dbCar = new BasicDBObject();
dbCar.append("name", car.getName());
result.add(dbCar);
}
return result;
}
Update: as Sammaye pointed out in the comments, replace $push with $set for replacing the list. $push will append non-existing elements to the array without removing what's there form before.
Friends!
I am using MongoDB in java project via spring-data. I use Repository interfaces to access data in collections. For some processing I need to iterate over all elements of collection. I can use fetchAll method of repository, but it always return ArrayList.
However, it is supposed that one of collections would be large - up to 1 million records several kilobytes each at least. I suppose I should not use fetchAll in such cases, but I could not find neither convenient methods returning some iterator (which may allow collection to be fetched partially), nor convenient methods with callbacks.
I've seen only support for retrieving such collections in pages. I wonder whether it is the only way for working with such collections?
Late response, but maybe will help someone in the future. Spring data doesn't provide any API to wrap Mongo DB Cursor capabilities. It uses it within find methods, but always returns completed list of objects. Options are to use Mongo API directly or to use Spring Data Paging API, something like that:
final int pageLimit = 300;
int pageNumber = 0;
Page<T> page = repository.findAll(new PageRequest(pageNumber, pageLimit));
while (page.hasNextPage()) {
processPageContent(page.getContent());
page = repository.findAll(new PageRequest(++pageNumber, pageLimit));
}
// process last page
processPageContent(page.getContent());
UPD (!) This method is not sufficient for large sets of data (see #Shawn Bush comments) Please use Mongo API directly for such cases.
Since this question got bumped recently, this answer needs some more love!
If you use Spring Data Repository interfaces, you can declare a custom method that returns a Stream, and it will be implemented by Spring Data using cursors:
import java.util.Stream;
public interface AlarmRepository extends CrudRepository<Alarm, String> {
Stream<Alarm> findAllBy();
}
So for the large amount of data you can stream them and process the line by line without memory limitation.
See https://docs.spring.io/spring-data/mongodb/docs/current/reference/html/#mongodb.repositories.queries
you can still use mongoTemplate to access the Collection and simply use DBCursor:
DBCollection collection = mongoTemplate.getCollection("boundary");
DBCursor cursor = collection.find();
while(cursor.hasNext()){
DBObject obj = cursor.next();
Object object = obj.get("polygons");
..
...
}
Use MongoTemplate::stream() as probably the most appropriate Java wrapper to DBCursor
Another way:
do{
page = repository.findAll(new PageRequest(pageNumber, pageLimit));
pageNumber++;
}while (!page.isLastPage());
Check new method to handle results per document basis.
http://docs.spring.io/spring-data/mongodb/docs/current/api/org/springframework/data/mongodb/core/MongoTemplate.html#executeQuery-org.springframework.data.mongodb.core.query.Query-java.lang.String-org.springframework.data.mongodb.core.DocumentCallbackHandler-
You may want to try the DBCursor way like this:
DBObject query = new BasicDBObject(); //setup the query criteria
query.put("method", method);
query.put("ctime", (new BasicDBObject("$gte", bTime)).append("$lt", eTime));
logger.debug("query: {}", query);
DBObject fields = new BasicDBObject(); //only get the needed fields.
fields.put("_id", 0);
fields.put("uId", 1);
fields.put("ctime", 1);
DBCursor dbCursor = mongoTemplate.getCollection("collectionName").find(query, fields);
while (dbCursor.hasNext()){
DBObject object = dbCursor.next();
logger.debug("object: {}", object);
//do something.
}
The best way to iterator over a large collection is to use the Mongo API directly. I used the below code and it worked like a charm for my use-case.
I had to iterate over more than 15M records and the document size was huge for some of those.
The following code is in Kotlin Spring Boot App (Spring Boot Version: 2.4.5)
fun getAbcCursor(batchSize: Int, from: Long?, to: Long?): MongoCursor<Document> {
val collection = xyzMongoTemplate.getCollection("abc")
val query = Document("field1", "value1")
if (from != null) {
val fromDate = Date(from)
val toDate = if (to != null) { Date(to) } else { Date() }
query.append(
"createTime",
Document(
"\$gte", fromDate
).append(
"\$lte", toDate
)
)
}
return collection.find(query).batchSize(batchSize).iterator()
}
Then, from a service layer method, you can just keep calling MongoCursor.next() on returned cursor till MongoCursor.hasNext() returns true.
An Important Observation: Please do not miss adding batchSize on 'FindIterable' (the return type of MongoCollection.find()). If you won't provide the batch size, the cursor will fetch initial 101 records and will hang after that (it tries to fetch all the remaining records at once).
For my scenario, I used the batch size as 2000, as it gave the best results during testing. This optimized batch size will be impacted by the average size of your records.
Here is the equivalent code in Java (removing createTime from query as it is specific to my data model).
MongoCursor<Document> getAbcCursor(Int batchSize) {
MongoCollection<Document> collection = xyzMongoTemplate.getCollection("your_collection_name");
Document query = new Document("field1", "value1");// query --> {"field1": "value1"}
return collection.find(query).batchSize(batchSize).iterator();
}
This answer is based on: https://stackoverflow.com/a/22711715/5622596
That answer needs a bit of an update as PageRequest has changed how it is being constructed.
With that said here is my modified response:
int pageNumber = 1;
//Change value to whatever size you want the page to have
int pageLimit = 100;
Page<SomeClass> page;
List<SomeClass> compondList= new LinkedList<>();
do{
PageRequest pageRequest = PageRequest.of(pageNumber, pageLimit);
page = repository.findAll(pageRequest);
List<SomeClass> listFromPage = page.getContent();
//Do something with this list example below
compondList.addAll(listFromPage);
pageNumber++;
}while (!page.isLast());
//Do something with the compondList: example below
return compondList;