Firestore generated key versus custom key in a collection? - java

I am using Cloud Firestore database in my Android app and I have different documents within collections like: uid for users, pushed keys for restaurants and numbers for my recipes.
My db:
users
uid1
uid2
...
resturants
pushedId1
pushedId2
...
recipes
0001
0002
...
For the users I understand to use the uid's but is better to use Firestore pushed ids for my restaurants? Is this a convention or why to use it?
I also tried to generate unique keys using UUID Class but is more easy for me to use only numbers for my recipes. Is this a bad approach?
Any help will be appreciated, thank you!

By using predictable (e.g. sequential) IDs for documents, you increase the chance you'll hit hotspots in the backend infrastructure. This decreases the scalability of the write operations.
Cloud Firestore has a built-in generator for unique IDs, that is used when you call CollectionReference.add(...) or CollectionReference.document() (without parameters). The ID that it generates is random and highly unpredictable, which prevents hitting certain hotspots in the backend infrastructure.
Using UIDs for the documents of users is a fine substitute for Firestore's built-in generator, since the UIDs already have a high level of entropy: you can't predict the UID of the next user based on knowing the current user. In such a case, using the UID (or otherwise the natural key of the entity) is a better approach, since you can perform direct lookups of the documents instead of having to query.
See this discussion on the firebase-talk mailing list where some of the engineers working on Firestore explain in more detail.

First of all there are no pushed id's in Firestore. We use the push() method in Firebase Realtime database. In Cloud Firestore we pass no argument to the document() method in order to generate a unique id for a document.
In case of users, the best unique identifier is the uid. In case of other collections like resturants, recipes or any other collection, you should consider using the id's that are generated by Firestore.
Unlike in Firebase Realtime database where there is an astronomically small chance that two users can generate a push ID at the same exact period of time and with the same exact randomness, in Cloud Firestore the IDs are actually purely random (there's no time component included).
And as an answer, you should definitely use the random keys that are generated by Firestore. Don't use simple numbers as keys for your documents.
Edit: Using sequential IDs is an anti-pattern when it comes to Firebase. Is not recommended to use this tehnique in Cloud Firestore nor in Firebase Realtime database, since it will cause scalability problems. To benefit from one of its most important features in Firestore, which is scalability, you should consider not doing this. Scalability is one of Firestore key features and it comes from how Firestore spreads the document out over its storage layer.
Using other tehniques rather than what Firestore offers, increase the hashing collisions, which means you hit write limitations in a shorter time. Having absolut random ids ensures that the writes are spread out evenly across the storage layer.

Related

Is it possible to compare two collections in Firestore?

I'm developing an Android app with Java and using Firestore, It's a social network and I have a collection with all the posts. I'm trying to show only those posts that belong to the followed users, so I make a query to show all the posts ordered by timestamp, but I don't know if I can filtered them by comparing with the collection "followed" inside "User".
The main collection "Users" has documents, each of them is a user, inside every user there is a subcollection "followed" that contains the followed users, every document is a user and the document id is the same that the User ID.
The posts are stored in another main collection called Posts, so I need to compare the id User inside "Posts" documents with the id of the docs in the subcollection "followed". I hope somebody can help me, I spent a lot of time and I can't find anything, thank you.
Firestore does not have the ability to "join" documents in collections as you're describing here. It's relatively straightforward in SQL (if your server has enough memory), but Firestore (and other NoSQL databases) aren't built for this, due to its distributed nature, and the way it needs to scale.
The only way to do what you want is to write code to read every document in every collection that would need a comparison, and also perform that comparison with the documents in memory.

Incrementing Number as Firestore Document Name?

Basically summed up in the title, I would like to make it so that each new document being created in a particular collection has an increment sort of serial number to it. This is for properly tracking the new orders that are written to the database. AutoID is random and causes sorting issues, I would like the data to be easily manageable. Is this possible to achieve via Cloud Functions? Any sample code snippets I can look at? Thank you!
Use firebase.firestore.FieldValue.serverTimestamp. It will be set by the server to nanosecond resolution.
firebase.firestore().collection('stuff').add({
sort: firebase.firestore.FieldValue.serverTimestamp(),
});
As per #Frank van Puffelen comment, "Using sequential IDs for that is an anti-pattern". Your use case is also mentioned in the Firestore documentation here:
Important: Unlike "push IDs" in the Firebase Realtime Database, Cloud
Firestore auto-generated IDs do not provide any automatic ordering. If
you want to be able to order your documents by creation date, you
should store a timestamp as a field in the documents.

A hybrid of cache based and query based paging in hibernate/JPA

If the result set is large, then having the entire result set in memory (server cache e.g. hazelcast) will not be feasible. With large result sets, you cannot afford to have them in memory. In such case, you have to fetch a chunk of data at a time (query based paging). The down side of using query based paging, is that there will be multiple calls to the database for multiple page requests.
Can anyone suggest how to implement a hybrid approach of it.
I haven't put any sample code here since I think the question is more about a logic instead of specific code. Still if you need sample code I can put it.
Thanks in advance.
The most effective solution is to use the primary key as a paging criterion.This enables us to rely of first class constructs like a between range query which is simple for the RDBMS to optimize, the primary key of the queried entity will most likely be indexed already.
Retrieving data using a range query on the primary key is a two-step process. First one have to retrieve the collection of primary-keys, followed by a step to generate the intervals to properly identify a proper subset of the data,followed by the actual queries against the data.
This approach is almost as fast as the brute-force version. The memory consumption is about one tenth. By selecting the appropriate page-size for this implementation, you may alter the ratio between execution time and memory consumption. This version is also stateless, it does not keep references to resources like the ScrollableResults version does, nor does it strain the database like the version using setFirstResult/setMaxResult.
Effective pagination using Hibernate

Appengine Search API vs Datastore

I am trying to decide whether I should use App-engine Search API or Datastore for an App-engine Connected Android Project. The only distinction that the google documentation makes is
... an index search can find no more than 10,000 matching documents.
The App Engine Datastore may be more appropriate for applications that
need to retrieve very large result sets.
Given that I am already very familiar with the Datastore: Will someone please help me, assuming I don't need 10,000 results?
Are there any advantages to using the Search API versus using Datastore for my queries (per the quote above, it seems sensible to use one or the other)? In my case the end user must be able to search, update existing entries, and create new entities. For example if my app is a bookstore, the user must be able to add new books, add reviews to existing books, search for a specific book.
My data structure is such that the content will be supplied by the end user. Document vs Datastore entity: which is cheaper to update? $$, etc.
Can they supplement each other: Datastore and Search API? What's the advantage? Why would someone consider pairing the two? What's the catch/cost?
Some other info:
The datastore is a transactional system, which is important in many use cases. The search API is not. For example, you can't put and delete and document in a search index in a single transaction.
The datastore has a lot in common with a NoSql DB like Cassandra, while the search API is really a textual search engine, very similar to something like Lucene. If you understand how a reverse index works, you'll get a better understanding of how the search API works.
A very good reason to combine usage of the datastore API and the search API is that the datastore makes it very difficult to do some types of queries (e.g. free text queries, geospatial queries) that the search API handles very easily. Thus, you could store your main entities in the datastore, but then use the search API if you need to search in ways the datastore doesn't allow. Down the road, I think it would be great if the datastore and search API were more tightly integrated, for example by letting you do free text search against indexed Text fields, where app engine would automatically create a search Document Index behind the scenes for you.
The key difference is that with the Datastore you cannot search inside entities. If you have a book called "War and peace", you cannot find it if a user types "war peace" in a search box. The same with reviews, etc. Therefore, it's not really an option for you.
The most serious con of Search API is Eventual Consistency as stated here:
https://developers.google.com/appengine/docs/java/search/#Java_Consistency
It means that when you add or update a record with Search API, it may not reflect the change immediately. Imagine a case where a user upload a book or update his account setting, and nothing changes because the change hasn't gone to all servers yet.
I think Search API is only good for one thing: Search. It basically acts as a search engine for your data in Datastore.
So my advice is to keep the data in datastore that user expects immediate result, and use Search API to search the data that user won't expect immediate result.
The Datastore only provides a few query operators (=, !=, <, >), doing nested filters and multiple inequalities would either be costly or impossible (timeouts) and search results may give a lot of False Positives. You can do partial string search by tokenizing but this will bloat your entity. Best way to get through these limitations is using Structured Properties and/or Ancestor Queries.
Search API on the other hand runs a Full Text search on Search Documents, which is faster and more accurate than NDB queries without relying on tokenized data. Downside is it relies on data staying up to date.
Use Datastore to process your data (create, update, delete), then run a function to put these data as documents and cluster using indexes, then run the searches using the Search API.

Designing Unique Keys(Primary Keys) for a heavily denormalized NoSQL database

I am working on a web application related to Discussion forums using Java and Cassandra database.
I need to construct 'keys' for the rows storing the user's details and & another set of rows storing the content posted by the user.
One option is to get the randomly generated UUID provided by Java language, but these are 16 bytes long. and since NoSQL database involves heavy denormalization, I am concerned whether I would be wasting lots of disk space, RAM and other resources if the key could be generated in smaller sizes.
I need to generate two types of keys, one for the Users & other for Content Posted by Users.
For the Content posted by users, would timestamp+userId be a good key. where timestamp is the server time at which content was posted and userId refers to key of user row.
Any suggestions, comments appreciated ..
Thanks
Marcos
Is this a distributed application?
Then you could use a simple synchronized counter and initialize it on startup with the next available id.
On the other hand a database should be able to handle the UUID hashes as created by java.
This is a standard for creating things like sessionIds, that need to be unique.
Your problem is somewhat similar since a session in your context would represent a set of user input.

Categories