Mongodb indexing should be redefined every time

Mongodb indexing should be redefined every time - java

I am really new to noSql and mongoDb and alot of questions are in my hand,
After searching I found Morphia, a ODM framework for java, in the documents of Morphia we can see some annotations like #Indexed that cause to create index for that specific column.
But the confusing issue for me is "datastore.ensureIndexes()" , document says
If you are using #Indexedannotation you shoud call datastore.ensureIndexes() after registering entity after application start.
So I can see my question in my mind after reading thatsentence, "we should redefine all indexes everytime?
I expect we can define indexes once somewhere like mongeez (mongeez is similar to liquibase) to run once at all.

Calling ensureIndexes() is virtually free if the indexes are already in place. You can (and arguably should) put this call directly after your mapping calls with no tangible impact. Any new indexes would get created but the existing indexes would essentially be no-ops.
For what it's worth, the morphia documentation can be found here.

So what you are referring to is the documentation here, and possibly then a little clarification of what that means along with the opinions that are offered in that document.
So as the document says, as per your quote, is that the index definitions you provide for your "Entity" classes are picked up by the .ensureIndex() method on the datastore, "when that is called in order to go and "re-create" all of those indexes.
The second point in the documentation defines this "as an example" along with the class mapping definitions like so:
Morphia m = ...
Datastore ds = ...
m.map(Product.class);
ds.ensureIndexes(); //creates all defined with #Indexed
And indeed that will be invoked every time your application is started, and some consider this best practice to ensure that all the definitions are "up to date" as it were. But also note, this is only an opinion.
As you seem to be pointing at, it would probably be better practice if you simply had some "post deploy" hook in which can be called when you actually "deploy" your application, or indeed "as needed" where you determine that re-defining your indexes is actually required.
It is generally one technique that I agree with, is to expose such a method as "callable" API for your application so that upon deployment, you can "script in" methods there to call that API function and actually re-define all your indexes (or even sub-set of) as you decide to do so.
So the actual interpretation is that using Morphia does not actually mean your indexes are re-defined every time the application is started "automatically", but if you do put the call to that .enssureIndexes() method somewhere where it will be called every time the application is started, then it will do so.
Don't call this in the same place as the class mappings. Put it somewhere else where you can control this, and the problem is solved.

Related

Object builder that requires parameters based on other parameters

I am working with several external APIs on a business code that would be used by several developers that do not have the deep knowledge required to build meaningful queries to those APIs.
Those API retrieve data. For example, you can retrieve entities either based on their Key (direct access) or based on their Group (lists available entities). But if you choose to query by Key you have to provide an id, and if you query by Group you have to provide a groupName.
The APIs are bigger than that and more complex, with many possible use-cases. The main constraints are that:
Some parameters require the presence of other parameters
Some parameters put with other parameters produce no data at best, wrong data at worst.
I would love to fix the underlying APIs but they are outside our scope.
I think it might be good to enclose a bit those API and produced a APIService that can request(APIQuery query).
The basic thing I could do is to put conditions in the code to check that no developer instantiates the APIQuery with missing/incoherent parameters, however that would only be a runtime error. I would love for the developer to know when building their request that they can/cannot do something.
My two questions are:
Is there an extensible builder-like way to defer the responsibility of building itself to the object? Having 1 constructor per valid query is not a good solution, as there are many variables and "unspoken rules" here.
Is this even a good idea? Am I trying to over-engineer?

I'll answer your second question first:
Is this even a good idea? Am I trying to over-engineer?
The answer is an uncomfortable "it depends". It depends how bad the pain is, it depends how crucial it is to get this right. It depends on so many factors that we can't really tell.
And to your: is this possible?
Yes, a builder pattern can be extended to return specific builders when certain methods are called, but this can become complicated and mis-uses are possible.
For your specific example I'd make the QueryBuilder simply have two methods:
a byGroup method that takes a group value to filter on and returns a GroupQueryBuilder
a bykey method that takes a key value to filter on and returns a KeyQueryBuilder.
Those two classes can then have methods that are distinct to their respective queries and possibly extend a shared base class that provides common properties.
And their respective build methods could either return a APIQuery or distinct APIQueryByGroup/APIQueryByKey classes, whichever is more useful for you.
This can become way more complicated if you have multiple axis upon which queries can differ and at a certain point, it'll become very hard to map that onto types.

reflection or runtime class declaring?

well I have a simple question.
we are working on a simple application server(like), and this server accepts clients business at runtime without restarting the server.
When user implements it's business and deploy it to the server, server just try to find out the archive descriptor, and load the modules, and it works good.
some operations need to much reflection calls, and for each call they called again and again. for example there is a method which accepts any object, then search for a certain field witch has signed by a annotation and do some business with it, so if we call this method 1000 times with one same object, this is going to reflect 1000 times.
my question is, is it efficient? I mean doesn't it eat up the CPU?! the only possible solution I'm thinking is that create a class and compile it for each object(maybe wrapper) and the method will just find out the wrapper class. but I know this may make the system complex, and hard to debug.
current solution is working, but I think doing a work 1000 times is kinda not logical even it's simple and easy.
Thanks in advanced.

The use of reflection to dynamically load classes at runtime is not a bad choice per-se. Based on your description, you should provide an extensible framework that allows your client to make an implementation, and run their business logic based on that instead of some implicit run-time annotated magic.
A good real-world example for this off the top of my head is The Servlet API.

for example there is a method which accepts any object, then search
for a certain field witch has signed by a annotation and do some
business with it, so if we call this method 1000 times with one same
object, this is going to reflect 1000 times.
In this case I suggest you to use caching. After reflection is finished you'll know class name and the field name. Store them in a HashMap with Class type key and a Method as a value.
Next time you invoke "the method" check cache first.

URIs for composite objects in RESTful web service

I have created the paths:
POST /books/4/chapters
The Chapter entity is part of the Book composition. It can not exist in the system without a book. Now after creating a chapter by posting to the URI above, should I create another set of URIs for the answer resource for updating and getting a particular chapter?
GET /books/4/chapters/6 or GET /chapters/6?
Remember that once you have the primary key for one level, you usually don't need to include the levels above because you've already got your specific object. In other words, you shouldn't need too many cases where a URL is deeper than what we have above /resource/identifier/resource.
From apigee Web API Design
This GET /chapters/6 would be more true to what the article says, however that also means that you object is not in scope of its parent anymore (since it is part of a composite object of class Book). However I feel that this is better since chapters could be a composition of other objects again meaning that you get long nested URIs
GET /books/4/chapters/5/paragrahps/5 if everything should be in scope of the parent.
What would be the preferred way of doing this
Edit
After more thinking it probably will be the best to have URIs like /books/4/chapters/9 etc since you don't have repositories etc in the code for retrieving a particular feedback without its parent because it is a composite?

What I would do is definitely the way you mentioned. For example :
/books/4/chapters -- GET : Retrieve full list of chapters of the book
/books/4/chapters/9 -- GET : Retrieve the 9th chapter of book 4.
The important keyword is the of. It's a chapter OF a book and it's totally irrelevant to present it without its book. Just doing /chapters/9 is very unclear. You see it as a full entity, while it really is a subset of a book.
Using the way illustrated over you will have very clear URIs. You a retrieving a specific resource (chapter 9) which is a sub-resource of the other (and thus, you have to be mentioning the "super" resource).
I'd really advise you to take a look at this great presentation by David Zülke, a member of the Symfony team. It's a language-agnostic presentation about REST. More precisely it talks about URIs from 16min~ to 30min~, but the whole presentation is great and worth watching.
A note about the apigee presentation
I've watched it today, and while I do agree with them, in most cases. There is one thing that I see here. While it might be great to be able to retrieve a chapter by
/chapter/{its id} -- GET
The problem is that, in some cases, you want the 9th chapter of a book, and not necessarily retrieve chapter 238723 (which is unclear that it is the 9th chapter). And in that case, it makes more sense to retrieve it by doing :
/books/4/chapters/9 -- GET

how can I get the History of an object or trace an Object

I have a requirement, where support in my application a lot of processing is happening, at some point of time an exception occrured, due to an object. Now I would like to know the whole history of that object. I mean whatever happened with that object over the period of time since the application has started.
Is this peeping into this history of Object possible thru anyway using JMX or anything else ?
Thanks

In one word: No
With a few more words:
The JVM does not keep any history on any object past its current state, except for very little information related to garbage collection and perhaps some method call metrics needed for the HotSpot optimizer. Doing otherwise would imply a huge processing and memory overhead. There is also the question of granularity; do you log field changes only? Every method call? Every CPU instruction during a method call? The JVM simply takes the easy way out and does none of the above.
You have to isolate the class and/or specific instance of that object and log any operation that you need on your own. You will probably have to do that manually - I have yet to find a bytecode instrumentation library that would allow me to insert logging code at runtime...
Alternatively, you might be able to use an instrumenting profiler, but be prepared for a huge performance drop when doing that.

That's not possible with standard Java (or any other programming language I'm aware of). You should add sufficient logging to your application, which will allow you to get some idea of what's happened. Also, learn to use your IDE's debugger if you don't already know how.

I generally agree with #thkala and #artbristol (+1 for both).
But you have a requirement and have no choice: you need a solution.
I'd recommend you to try to wrap your objects with dynamic proxies that perform auditing, i.e. write all changes that happen to object.
You can probably use AspectJ for this. The aspect will note what method was called and what are the parameters that were sent. You can also use other, lower level tools, e.g. Javasist or CgLib.

Answer is No.JVM doesn't mainatain the history of object's state.Maximum what you can do you can keep track of states of your object that could be some where in-memory and when you get exception you can serialize that in-memory object and then i think you can do analysis.

Multiple threads modifying a collection in Java?

The project I am working on requires a whole bunch of queries towards a database. In principle there are two types of queries I am using:
read from excel file, check for a couple of parameters and do a query for hits in the database. These hits are then registered as a series of custom classes. Any hit may (and most likely will) occur more than once so this part of the code checks and updates the occurrence in a custom list implementation that extends ArrayList.
for each hit found, do a detail query and parse the output, so that the classes created in (I) get detailed info.
I figured I would use multiple threads to optimize time-wise. However I can't really come up with a good way to solve the problem that occurs with the collection these items are stored in. To elaborate a little bit; throughout the execution objects are supposed to be modified by both (I) and (II).
I deliberately didn't c/p any code, as it would be big chunks of code to make any sense.. I hope it make some sense with the description above.
Thanks,

In Java 5 and above, you may either use CopyOnWriteArrayList or a synchronized wrapper around your list. In earlier Java versions, only the latter choice is available. The same is true if you absolutely want to stick to the custom ArrayList implementation you mention.
CopyOnWriteArrayList is feasible if the container is read much more often than written (changed), which seems to be true based on your explanation. Its atomic addIfAbsent() method may even help simplify your code.
[Update] On second thought, a map sounds more fitting to the use case you describe. So if changing from a list to e.g. a map is an option, you should consider ConcurrentHashMap. [/Update]
Changing the objects within the container does not affect the container itself, however you need to ensure that the objects themselves are thread-safe.

Just use the new java.util.concurrent packages.
Classes like ConcurrentLinkedQueue and ConcurrentHashMap are already there for you to use and are all thread-safe.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Mongodb indexing should be redefined every time - java

Related

Object builder that requires parameters based on other parameters

reflection or runtime class declaring?

URIs for composite objects in RESTful web service

how can I get the History of an object or trace an Object

Multiple threads modifying a collection in Java?

Categories

Resources