Setting ttl on a mongoDB collection - in application or shell? - java

I would like to set the ttl for a collection once, what is the idiomatic way of achieving this when building a java application that uses mongoDB? Do ppl simply apply settings like these in the shell? Or in the application code is it normal to check if a collection is already in the DB, if it is not then create it with the desired options?
Thanks!

I never do index building in my application code anymore.
I confess that I used to. Everytime my application started up I would ensure all my indexes, until suddenly one day a beginner developer got hold of my code and accidently deleted a character within one of my index sequences.
Consequently the entire cluster froze and went down due to processing, in the foreground, this index building. Fortunately I had a number of delayed and non-index building slaves to repair from but still, I lost about 12 hours all in all and in turn 12 hours of business.
I would recommend you don't do your index building in the application code but instead carfully within your mongo console. That goes for any operation like this, even TTL indexing.

You can set a TTL on a collection as documented here.
Using the Java driver, I would try:
theTTLCollection.ensureIndex(new BasicDBObject("status", 1), new BasicDBObject("expireAfterSeconds", 3600));
hth.

Setting a TTL
is an index operation, so I guess that it would not be wise performance wise to do it every time your code is running.

Related

neo4j "empty" database takes up a lot of disk space

I've inserted ~2M nodes (via Java API), and deleted them after a day or two of usage (through java too). Now my db has got 16k nodes, and weights 6 GB.
Why this space wasn't freed?
What may be the cause?
The data/graph.db directory contains multiple items:
Store itself, split into multiple files
Indexes
Transaction log files
Log files (messages.log)
All your operations are stored in the transaction logs and then expire according to the keep_logical_logs setting. Not sure what the default value is, by I presume that you might have quite some space in use there.
I'd suggest to check what is taking up the space.
Also, we have sometimes seen that the space in use (reported with du for example) differs when Neo4j is running and stopped.
In addition to Alberto's answer, the store is not compacted. It leaves the empty records for reuse, and they will stay there forever. As far as I know, there is no available tool to compact the store (I've considered writing one myself, but usually convince myself that there aren't that many use cases affected by this).
If you do have a lot of churn where you are inserting and deleting records often, it's a good idea to restart your database often so it will reuse the records that it has marked as deleted.
As Alberto mentions, one of the first things I set (the other being the heap size) when I install a new neo4j is the keep_logical_logs to something like 1-7 days. If you let them grow forever (the default), they will get quite large.

why is my appengine IdGeneratorStrategy generating huge numbers?

I just moved my code from one machine to another, released it and suddenly it created an entry with a key of "576728208506880" so I re-released the exact same code from the original machine and created another field and this time the key created was "21134006"
Can anyone shed any light on why this might be?!
Thanks,
J
It's perfectly normal. App Engine generates numeric IDs between 0 and 2^53, and scatters them out throughout the entire range:
http://googlecloudplatform.blogspot.ca/2013/05/update-on-datastore-auto-ids.html
You can hack around it a bit by using the legacy auto id policy in your settings.
Appengine datastore IDs are not generated sequentially.
(Imagine that you had a burst of 1,000 new entities created in the same second - the short answer is that AppEngine needs a strategy to generate IDs that won't collide).
See this answer for more details and a potential solution.
See "Assigning Identifiers" of the AppEngine docs for more information.

Best way to store small, alternating, public data that updates every couple of hours?

The essence of my problem is that there are too many solutions, and I would like to find which one wins out in pros and cons before I build an infrastructure around it.
(Simplified for the purpose of this forum) This is an auction site where five auctions are stored in a rank #1-5, #1 being the currently featured auction. The other four are simply "on deck." After either a couple hours or the completion of that auction, #2-5 move up to #1-4 and a new one is chosen to be #5
I'm using a dedicated server and I've been considering just storing the data in the servlet or maybe adding a column in the database as a boolean for each auction...like "isFeatured = 1"
Suffice it to say the data is read about 5 times+ more often than it is written, which is why I'm leaning towards good old SQL.
When you can retrieve the relevant auctions from DB with a simple query with ORDER BY and TOP or something similar then try this. If no performance issues occur then KISS and you're done.
Otherwise when these 5 auctions are valid for a while then cache them in memory. Have a singleton holding these auctions and provide methods for updating for example. Maybe you want to use a caching lib instead. Update these Top5 whenever necessary but serve them directly out of memory without hiting a DB or something similar expensive.
What kind of scale are you looking for? How many application servers need access to the data?
I think you're probably making this more complicated than it is. Just use a database, take a hit of ACID, and move onto whatever else you need to work on. :P
Have you taken a look at SQLite? It allows for "good old SQL" without all of the hassles of setting up a separate database server. As long as the data isn't too huge (to be fair, I haven't tested the size limits, but I've skimmed blog entries mentioning the use of SQLite to process files of several dozen MB in size quickly and with no problems), you should be fine.
It isn't a perfect solution for all needs (frankly, I sometimes find the dynamic typing to be a pain), but since it relies on locally stored files, reads will be much faster than firing up a network connection to talk to a more "traditional" RDBMS.

User matching with current data

I have a database full of two different types of users (Mentors and Mentees), whereby I want the second group (Mentees) to be able to "search" for people in the first group (Mentors) who match their profile. Mentors and Mentees can both go in and change items in their profile at any point in time.
Currently, I am using Apache Mahout for the user matching (recommender.mostSimilarIDs()). The problem I'm running into is that I have to reload the user data every single time anyone searches. By itself, this doesn't take that long, but when Mahout processes the data it seems to take a very long time (14 minutes for 3000 Mentors and 3000 Mentees). After processing, matching takes mere seconds. I also get the same INFO message over and over again while it's processing ("Processed 2248 users"), while looking at the code shows that the message should only be outputted every 10000 users.
I'm using the GenericUserBasedRecommender and the GenericDataModel, along with the NearestNUserNeighborhood, AveragingPreferenceInferrer and PearsonCorrelationSimilarity. I load mentors from the database, add the mentee to the list of POJOs and convert them to a FastByIDMap to give to the DataModel.
Is there a better way to be doing this? The product owner needs the data to be current for every search.
(I'm the author.)
You shouldn't need to ask it to reload the data every time, why's that?
14 minutes sounds way, way too long to load such a small amount of data too, something's wrong. You might follow up with more info at user#mahout.apache.org.
You are seeing log messages from a DataModel, which you can disable in your logging system of choice. It prints one final count. This is nothing to worry about.
I would advise you against using a PreferenceInferrer unless you absolutely know you want it. Do you actually have ratings here? I might suggest LogLikelihoodSimilarity if not.

MS Access - Can't Open Any More Tables

at work we have to deal with several MS Access mdb files, so we use the default JdbcOdbcBridge Driver which comes with the Sun JVM and, for most cases, it works great.
The problem is that when we have to deal with some larger files, we face several times exceptions with the message "Can't open any more tables". How can we avoid that?
We already close all our instances of PreparedStatements and RecordSets, and even set their variables to null, but even so this exception continues to happen. What should we do? How can we avoid these nasty exceptions? Does someone here knows how?
Is there any additional configuration to the ODBC drivers on Windows that we can change to avoid this problem?
"Can't open any more tables" is a better error message than the "Can't open any more databases," which is more commonly encountered in my experience. In fact, that latter message is almost always masking the former.
The Jet 4 database engine has a limit of 2048 table handles. It's not entirely clear to me whether this is simultaneous or cumulative within the life of a connection. I've always assumed it is cumulative, since opening fewer recordsets at a time in practice seems to make it possible to avoid the problem.
The issue is that "table handles" doesn't just refer to table handles, but to something much more.
Consider a saved QueryDef with this SQL:
SELECT tblInventory.* From tblInventory;
Running that QueryDef uses TWO table handles.
What?, you might ask? It only uses one table! But Jet uses a table handle for the table and a table handle for the saved QueryDef.
Thus, if you have a QueryDef like this:
SELECT qryInventory.InventoryID, qryAuthor.AuthorName
FROM qryInventory JOIN qryAuthor ON qryInventory.AuthorID = qryAuthor.AuthorID
...if each of your source queries has two tables in it, you're using these table handles, one for each:
Table 1 in qryInventory
Table 2 in qryInventory
qryInventory
Table 1 in qryAuthor
Table 2 in qryAuthor
qryAuthor
the top-level QueryDef
So, you might think you have only four tables involved (because there are only four base tables), but you'll actually be using 7 table handles in order to use those 4 base tables.
If in a recordset, you then use the saved QueryDef that uses 7 table handles, you've used up yet another table handle, for a total of 8.
Back in the Jet 3.5 days, the original table handles limitation was 1024, and I bumped up against it on a deadline when I replicated the data file after designing a working app. The problem was that some of the replication tables are open at all times (perhaps for each recordset?), and that used up just enough more table handles to put the app over the top.
In the original design of that app, I was opening a bunch of heavyweight forms with lots of subforms and combo boxes and listboxes, and at that time I used a lot of saved QueryDefs to preassemble standard recordsets that I'd use in many places (just like you would with views on any server database). What fixed the problem was:
loading the subforms only when they were displayed.
loading the rowsources of the combo boxes and listboxes only when they were onscreen.
getting rid of all the saved QueryDefs and using SQL statements that joined the raw tables, wherever possible.
This allowed me to deploy that app in the London office only one week later than planned. When Jet SP2 came out, it doubled the number of table handles, which is what we still have in Jet 4 (and, I presume, the ACE).
In terms of using Jet from Java via ODBC, the key point would be, I think:
use a single connection throughout your app, rather than opening and closing them as needed (which leaves you in danger of failing to close them).
open recordsets only when you need them, and clean up and release their resources when you are done.
Now, it could be that there are memory leaks somewhere in the JDBC=>ODBC=>Jet chain where you think you are releasing resources and they aren't getting released at all. I don't have any advice specific to JDBC (as I don't use it -- I'm an Access programmer, after all), but in VBA we have to be careful about explicitly closing our objects and releasing their memory structures because VBA uses reference counting, and sometimes it doesn't know that a reference to an object has been released, so it doesn't release the memory for that object when it goes out of scope.
So, in VBA code, any time you do this:
Dim db As DAO.Database
Dim rs As DAO.Recordset
Set db = DBEngine(0).OpenDatabase("[database path/name]")
Set rs = db.OpenRecordset("[SQL String]")
...after you've done what you need to do, you have to finish with this:
rs.Close ' closes the recordset
Set rs = Nothing ' clears the pointer to the memory formerly used by it
db.Close
Set db = Nothing
...and that's even if your declared variables go out of scope immediately after that code (which should release all the memory used by them, but doesn't do so 100% reliably).
Now, I'm not saying this is what you do in Java, but I'm simply suggesting that if you're having problems and you think you're releasing all your resources, perhaps you need to determine if you're depending on garbage collection to do so and instead need to do so explicitly.
Forgive me if I'd said anything that's stupid in regard to Java and JDBC -- I'm just reporting some of the problems that Access developers have had in interacting with Jet (via DAO, not ODBC) that report the same error message that you're getting, in the hope that our experience and practice might suggest a solution for your particular programming environment.
Recently I tried UCanAccess - a pure java JDBC Driver for MS Access. Check out: http://sourceforge.net/projects/ucanaccess/ - works on Linux too ;-) For loading the required libraries, some time is needed. I have not tested it for more than read-only purposes yet.
Anyway, I experienced problems as described above with the sun.jdbc.odbc.JdbcOdbcDriver. After adding close() statements following creation of statement objects (and calls to executeUpdate on those) as well as System.gc() statements, the error messages stopped ;-)
There's an outside chance that you're simply running out of free network connections. We had this problem on a busy system at work.
Something to note is that network connections, though closed, may not release the socket until garbage collection time. You could check this with NETSTAT /A /N /P TCP. If you have a lot of connections in the TIME_WAIT state, you could try forcing a garbage collection on connection closes or perhaps regular intervals.
You should also close your Connection object.
Looking into an alternative for the jdbc odbc driver would also be a good idea. Don't have any experience with an alternative myself but this would be a good place to start:
Is there an alternative to using sun.jdbc.odbc.JdbcOdbcDriver?
I had the same problem but none of the above was working. I eventualy locataed the issue.
I was using this to read the value of a form to put back into a lookup list record source.
LocationCode = [Forms]![Support].[LocationCode].Column(2)
ContactCode = Forms("Support")("TakenFrom")
Changed it to the below and it works.
LocationCode = Forms("Support")("LocationCode")
ContactCode = Forms("Support")("TakenFrom")
I know I should have written it better but I hope this helps someone else in the same situation.
Thanks
Greg

Categories