MongoDB search big data collection - java

I am trying search through big data collection of objects (1 000 000 000 elements).
Sample element looks like this:
Document{{_id=588e6f317367651f34a06c2c, busId=34, time=1262305558050, createdDate=Sun Jan 29 23:39:42 CET 2017}}
there are busIds from 0 to 300 and time increment about 30 milisecond on each record begins from
SimpleDateFormat sdf = new SimpleDateFormat("yyyy.MM.dd HH:mm:ss");
long startDate = sdf.parse("2010.01.01 00:00:00").getTime();
Now I am looking for all data with this query:
BasicDBObject gtQuery = new BasicDBObject();
List<BasicDBObject> obj = new ArrayList<BasicDBObject>();
obj.add(new BasicDBObject("busId", vehicleId));
obj.add(new BasicDBObject("time", new BasicDBObject("$gt", startDate.getTime()).append("$lt", endDate.getTime())));
gtQuery.put("$and", obj);
System.out.println(gtQuery.toString());
FindIterable<Document> curs = collection.find(gtQuery);
gtQuery output:
{ "$and" : [ { "busId" : "34"} , { "time" : { "$gt" : 1262304705000 , "$lt" : 1262308305000}}]}
Query is working but in this way it iterates over whole 1 000 000 000 elements in collection.
Is there any way to do it faster?

Try creating a compound index on busId and time as suggested by #ares

Related

How find all MongoDB documents inserted between two LocalDateTime

I'm trying to develop an app that retrive all documents inserted in a certain period.
This is my actual sample code:
MongoClient mongoClient = MongoClients.create("mongodb://localhost:27017");
MongoDatabase database = mongoClient.getDatabase("eam");
MongoCollection<Document> collection = database.getCollection("coll");
List<Document> docsList = new ArrayList<>();
LocalDateTime initDate = LocalDateTime.now();
LocalDateTime endDate = initDate.plusSeconds(5);
int i = 0;
while (LocalDateTime.now().isBefore(endDate)) {
Document doc = new Document("id", i)
.append("name objy", "Obj " + i)
.append("timeStamp", LocalDateTime.now());
docsList.add(doc);
i++;
}
collection.insertMany(docsList);
MongoCursor<Document> cursor = collection.find(new Document("timestamp", new Document("$gte", endDate.minusSeconds(3)).append("$lte", endDate.minusSeconds(2)))).iterator();
try {
while (cursor.hasNext()) {
System.out.println(cursor.next().toJson());
}
} finally {
cursor.close();
}
As #Valijo, I modified my code to filter by gte and lte but now It doesn't return anything!
Why?
Take a look at https://docs.mongodb.com/manual/reference/method/Date/ if you don't want to work with timestamps
$in checks if timestamp is equals to one of the values inside the $in: https://docs.mongodb.com/manual/reference/operator/query/in/
But you need a between, this code should work for you:
db.yourcollection.find({$gte: {'timestamp': min}, $lte: {'timestamp': max}})
please mind: the above code is for mongo shell, but you should be able to "translate" it to your needed syntax
EDIT: also mind that mongodbs time is always UTC
$in selects exact values as given array.
So, you need to keep the exact timestamp reference (with 1 ms precision)
The problem is here:
LocalDateTime initDate = LocalDateTime.now();
LocalDateTime endDate = initDate.plusSeconds(5);
int i = 0;
while (LocalDateTime.now().isBefore(endDate)) {
Document doc = new Document("id", i)
.append("name objy", "Obj " + i)
.append("timeStamp", LocalDateTime.now()); //<-- The timestamp ms may differ from initDate ms
docsList.add(doc);
i++;
}
Solution 1: While inserting documents, use:
initDate.plusSeconds(i)
And then your query will return what you expect
Solution 2: (You may translate to your programming language)
Keep timeStamp references and then search them
var date1 = new Date(1537457334015); //Thursday, 20 September 2018 15:28:54.015
var date2 = new Date(1537457335014); //Thursday, 20 September 2018 15:28:55.014
var date3 = new Date(1537457336015); //Thursday, 20 September 2018 15:28:56.015 1 sec 1 ms
var date4 = new Date(1537457336025); //Thursday, 20 September 2018 15:28:56.025 2 sec 11 ms
var date2Plus1Sec = new Date( date2.getTime() + 1000 );
//db.coll.remove({})
db.coll.insert([
{
"timeStamp" : date1
},
{
"timeStamp" : date2
},
{
"timeStamp" : date3
},
{
"timeStamp" : date4
}
])
db.coll.find({"timeStamp" :{$in: [date1, date2, date2Plus1Sec ]} } ).pretty();
Result:
/* 1 */
{
"_id" : ObjectId("5ba3bea3ba135b198e17ec2d"),
"timeStamp" : ISODate("2018-09-20T15:28:54.015Z")
}
/* 2 */
{
"_id" : ObjectId("5ba3bea3ba135b198e17ec2e"),
"timeStamp" : ISODate("2018-09-20T15:28:55.014Z")
}
So Thursday, 20 September 2018 15:28:56.014 not exists in database
Solution 3: Don't use exact value match and use $gte and $lte operators to search timeStamp range

Find all mongo documents containing a field with a specific value using java

I am trying to use mongo's Java driver to read through a collection and only pull back documents with a field that is a range of values. An example of this would be if I had data like
{ "name" : "foo", "Color" : "white", "Date" : 20171116 }
{ "name" : "bar", "Color" : "black", "Date" : 20171115 }
{ "name" : "Jeff", "Color" : "purple", "Date" : 20171114 }
{ "name" : "John", "Color" : "blue", "Date" : 20171015 }
I would want to begin on 20171114 and end on 20171116 so I would do something like
DateFormat df = new SimpleDateFormat("yyyyMMdd");
String begin = "20171114";
String end = "20171116";
Date startDate;
Date endDate;
Then I would need to convert the strings to a date and use a cursor like
try {
startDate = df.parse(startDateString);
endDate = df.parse(endDateString);
BasicDBObject data = new BasicDBObject();
data.put("Date", new BasicDBObject( new BasicDBOject( "$gte", startDate).append("$lte", endDate)));
} catch(ParseException e){
e.printStackTrace
}
However when I do this it returns nothing.
Answer:
I was trying to compare a number to a date which doesn't work so I converted my begin & end string to integers by doing
Integer beginDate = Integer.valueOf(begin)
Integer endDate = Integer.valueOf(end)
and it worked.
I think you have saved values of Date as numbers. Please check your code once how you are saving and please let us know.
Considering that please try below by passing long values as
try {
Long startDate = 20171114L;
Long endDate = 20171116L;
BasicDBObject data = new BasicDBObject();
data.put("Date", new BasicDBObject( new BasicDBObject( "$gte", startDate).append("$lte", endDate)));
} catch(ParseException e){
e.printStackTrace
}

How to use mongodb $group in java?

I have a collection processedClickLog in MongoDB.
{
"_id" : ObjectId("58ffb4cefbe21fa7896e2d73"),
"ID" : "81a5d7f48e5df09c9bc006e7cc89d6e6",
"USERID" : "206337611536",
"DATETIME" : "Fri Mar 31 17:29:34 -0400 2017",
"QUERYTEXT" : "Tom",
"DOCID" : "www.demo.com",
"TITLE" : "Harry Potter",
"TAB" : "People-Tab",
"TOTALRESULTS" : "1",
"DOCRANK" : 1
}
{ "id":
....
}
I am trying to execute a complex query in java. My query is to get processedClickLog collection where
TAB is not equal to People-Tab
DOCRANK is not equal to 0
only return "USERID", "DOCID", "DOCRANK", "QUERYTEXT" fields
Group by USERID
Below is my Java code. I am able to satisfy the first three condition. But I am stuck on 4th condition which is group by USERID.
String jsonResult = "";
MongoClient mongoClient = new MongoClient("localhost", 27017);
MongoDatabase database = mongoClient.getDatabase("test1");
MongoCollection<Document> collection = database.getCollection("processedClickLog");
//add condition where TAB is not equal to "People-Tab" and DOCRANK is not equal to 0
List<DBObject> criteria = new ArrayList<DBObject>();
criteria.add(new BasicDBObject("DOCRANK", new BasicDBObject("$ne", 0)));
criteria.add(new BasicDBObject("TAB", new BasicDBObject("$ne", "People-Tab")));
//combine the above two conditions
BasicDBObject query = new BasicDBObject("$and", criteria);
//to retrieve all the documents with specific fields
MongoCursor<Document> cursor = collection.find(query)
.projection(Projections.include("USERID", "DOCID", "DOCRANK", "QUERYTEXT")).iterator();
try {
while (cursor.hasNext()) {
System.out.println(cursor.next().toJson());
}
} finally {
cursor.close();
}
System.out.println(hashMap);
mongoClient.close();
}
How should I define my whole query to add the condition "group by USERID" in java? Any help is appreciated
You've to use aggregation framework. Statically import all the methods of helper classes and use the below code.
Use of BasicDBObject is not required in newer 3.x driver api. You should use the new class Document for similar needs.
import static com.mongodb.client.model.Accumulators.*;
import static com.mongodb.client.model.Aggregates.*;
import static java.util.Arrays.asList;
import static com.mongodb.client.model.Filters.*;
import static com.mongodb.client.model.Projections.*;
Bson match = match(and(ne("DOCRANK", 0), ne("TAB", "People-Tab")));
Bson group = group("$USERID", first("USERID", "$USERID"), first("DOCID", "$DOCID"), first("DOCRANK", "$DOCRANK"), first("QUERYTEXT", "$QUERYTEXT"));
Bson projection = project(fields(include("USERID", "DOCID", "DOCRANK", "QUERYTEXT"), excludeId()));
MongoCursor<Document> cursor = collection.aggregate(asList(match, group, projection)).iterator();
Projection stage is optional, only added to give a complete example.
More about aggregation here https://docs.mongodb.com/manual/reference/operator/aggregation/

MongoDB java driver get records where date between and user is

From a collection named "persons", I want to retreive all records where date is between
"14-11-2014" and "20-11-2014" <-- These are both in string format (dd-mm-yyyy)
AND
user: "Erik"
My mongoDB
{
"_id" : "546c9f26dbeaa7186ab042c4", <------this one should NOT be retreived
"Task: "Sometask" because of the user
"date" : "20-11-2014",
"user" : "Dean"
},
{
"_id" : "546caef6dbeaa7186ab042c5", <--------- This one should be retreived
"task": "A task",
"date" : "20-11-2014",
"user" : "Erik"
}
{
"_id" : "546caef6dbeaa7186ab042c5", <----- This one should NOT be retreived
"task": "A task", because of the date
"date" : "13-11-2014",
"user" : "Erik"
}
I am using java mongo java driver 2.11.3
Maybe there is some solution using BasicDBObject?
I'm very curious.. thanks
EDIT
I'm using:
public static String findTimelines(String begin, String end, String username) throws UnknownHostException, JSONException{
DBCollection dbCollection = checkConnection("timelines");
BasicDBObject query = new BasicDBObject();
query.put("date", BasicDBObjectBuilder.start("$gte", begin).add("$lte",end).get());
query.put("user", username);
dbCollection.find(query).sort(new BasicDBObject("date", -1));
DBCursor cursor = dbCollection.find(query);
return JSON.serialize(cursor);
}
Does work until you query something like "28-11-2014" to "01-12-2014", It doesn't return anything even though there is a object with date: "30-11-2014". I think this is because of the month change.
Also that object is retreived when you do: "28-11-2014" to "30-11-2014" because of the month staying the same
Please help!
Try something like this
BasicDBObject query = new BasicDBObject();
query.put("date", BasicDBObjectBuilder.start("$gte", fromDate).add("$lte", toDate).get());
collection.find(query).sort(new BasicDBObject("date", -1));
This is the query you would use:
db.posts.find({date: {$gte: start, $lt: end}, user: 'Erik'});
You should first parse your date using SimpleDateFormat or alike to get a Date object.
Then put together your query using BasicDBObject:
BasicDBObject q = new BasicDBObject();
q.append("date", new BasicDBObject().append("$gte", start).append("$lt", end));
q.append("user", "Erik");
collection.find(q);

Spring Data mongodb date range query

I am using spring data with mongodb to create an application.
I have this object:
public class Room {
private String name;
private List<Date> occupied;
}
I want using mongodbTemplate preferably to get the list of room that are not occupied for a date range.
So for example if i have a start date 10/10/2014 and end date 15/10/2014 I want to get the list of rooms that do not have in the list occupied the dates 10,11,12,13,14,15 for October 2014.
Does anyone have any idea on this?
Update:
I have found a way to do this by using this query:
query.addCriteria(Criteria.where("occupiedDates")
.ne(from).andOperator(
Criteria.where("occupiedDates").ne(to))
);
the problem is that I can not dynamically add the andOperator.
I would prefer inside the criteria to add a list of dates if possible.
An example document is (only one record exists in mongo this one) :
Room(bedcount=1, bedtype1=1, bedtype2=0, bedtype3=0, bedtype4=0,
filetype=null, noofrooms=0, occupancy=0, photo=null, rateid=1,
roomname=null, roomno=888, status=null,
roomTypeID=26060747427845848211948325568, occupiedDates=[Sun Aug 10
00:00:00 EEST 2014, Mon Aug 11 00:00:00 EEST 2014, Tue Aug 12 00:00:00
EEST 2014, Wed Aug 13 00:00:00 EEST 2014], attributes={})
And this is the code of how the wyeru is built:
SimpleDateFormat dateFormat = new SimpleDateFormat("dd-M-yyyy hh:mm:ss");
Date to = null;
Date from = null;
try {
to = dateFormat.parse("12-08-2014 00:00:00");
from = dateFormat.parse("10-08-2014 00:00:00");
} catch (ParseException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
DBObject c1 = new BasicDBObject("occupied", null);
DBObject c2 = BasicDBObjectBuilder.start()
.push("occupied").push("$not")
.push("$elemMatch").add("$gte", from).add("$lte", to).get();
Criteria c = Criteria.where("$or").is(Arrays.asList(c1, c2));
Query query = new Query().addCriteria(c);
List<Room> rooms = mongoTemplate.find(query, Room.class);
This query is sent to mongodb
{ "$or" : [
{ "occupied" : null } ,
{ "occupied" :
{ "$not" :
{ "$elemMatch" :
{ "$gte" : { "$date" : "2014-08-09T21:00:00.000Z"} ,
"$lte" : { "$date" : "2014-08-11T21:00:00.000Z"}
}
}
}
}
]}
from this we understand that the query should return nothing. but it returns me 1 row.
As the requirements I understood eventually, you want to fetch all documents in which
none of elements of occupied falls into the specified date range.
Complete on mongo shell:
db.b.find({
$or : [{
occupied : null
}, {
occupied : {
$not : {
$elemMatch : {
$gte : start,
$lte : end
}
}
}
}]
}).pretty();
Then translate to Java code as below:
// Because "Criteria" has a bug when invoking its method "elemMatch",
// so I build the criteria by the driver directly, almost.
DBObject c1 = new BasicDBObject("occupied", null);
DBObject c2 = BasicDBObjectBuilder.start().push("occupied").push("$not").
push("$elemMatch").add("$gte", start).add("$lte", end).get();
Criteria c = where("$or").is(Arrays.asList(c1, c2));
Query query = new Query().addCriteria(c);
List<Room> rooms = mongoTemplate.find(query, Room.class);
Analysis routing
According to your question, suppose there are some resources as below:
occupied = [d1, d2, d3, d4, d5]; // for easier representation, suppose elements are in ascending order
range = [start, end];
So, you want to return every document if its data satisfy one of the following criteria after sorting in ascending order:
start, end, d1, d2, d3, d4, d5 // equivalent to: min(occupied) > end
d1, d2, d3, d4, d5, start, end // equivalent to: max(occupied) < start
If elements are stored in order in occupied, it is easy to fetch the minimum and maximum value.
But you don't mention it, so I suppose essentially they are not in order.
Unfortunately, there is no operator to get minimum or maximum value from an array in standard query.
But according to the feature of array field in comparison of standard query,
matching will return true if at least one element or itself satisfying the criteria;
return false only all fail the criteria.
It's lucky to find that min(occupied) > end is equivalent to NOT (at least one element <= end), which is the key point to achieve by following method.
Fulfill on mongo shell:
db.b.find({
$or: [{
occupied: {
$not: {
$lte: end
}
}
}, {
occupied: {
$not: {
$gte: start
}
}
}]
}). pretty();
Then translate to Java code like this:
Criteria c = new Criteria().orOperator(where("occupied").not().lte(end),
where("occupied").not().gte(start));
Query query = new Query().addCriteria(c);
List<Room> rooms = mongoTemplate.find(query, Room.class);

Categories