Is it possible to compare two collections in Firestore? - java

I'm developing an Android app with Java and using Firestore, It's a social network and I have a collection with all the posts. I'm trying to show only those posts that belong to the followed users, so I make a query to show all the posts ordered by timestamp, but I don't know if I can filtered them by comparing with the collection "followed" inside "User".
The main collection "Users" has documents, each of them is a user, inside every user there is a subcollection "followed" that contains the followed users, every document is a user and the document id is the same that the User ID.
The posts are stored in another main collection called Posts, so I need to compare the id User inside "Posts" documents with the id of the docs in the subcollection "followed". I hope somebody can help me, I spent a lot of time and I can't find anything, thank you.

Firestore does not have the ability to "join" documents in collections as you're describing here. It's relatively straightforward in SQL (if your server has enough memory), but Firestore (and other NoSQL databases) aren't built for this, due to its distributed nature, and the way it needs to scale.
The only way to do what you want is to write code to read every document in every collection that would need a comparison, and also perform that comparison with the documents in memory.

Related

Comparing two accounts, by the answers of them, with Firebase

I want to save the data on Firebase, that a user describes on an Intent. The saved data should be used to find one or more matching users with similar informations.
Thanks in forward.
Looks like you are building an app where you can clusters users with similar interests about particular thing, i would suggest you to use firebase firestore for storing data, and you can retrieve data that are similar using the firebase simple and compound query depending on your requirement.

Obtain image via an API call to then save and serve up from local for repeated viewings

I'm working on a webapp at the moment that will display a list of items. The list is dynamic and can change between users. A great analogy is to think of the objects as books, with the db backing it as the library.
My database for Book will contain a list of all books in the library.
-A user can add a book to their collection.
-If a user wants to add a new book to their collection they will also add it to the library.
-If a user wants to add a new book to their collection and it exists in the library, nothing will be added to the Book database.
Currently my table is incredibly simple: Book(id, name). I am able to access a plethora of information about these books via an API call, such as a front cover, number of pages etc etc. I would like to store a subset of this information, especially the image url.
I think a sensible approach would be to alter my Book table so that it looks like: Book(id, name, imageUrl, otherValue, idOfThisBookInApiCallTable) the idOfThisBookInApiCallTable value will allow me to get other attributes as I need them, however I've two issues with this that I'm not sure on how to proceed.
Firstly is that this Table can easily get out of date with the APITable. I don't expect there to be much change, if any, but the risk is there.
Secondly, the image being stored is my main concern, on a page where there might be 50 books, I'll making a call to the url of the image each time. I think a sensible solution would be download the image locally and then serve it from then on repeated visits but I'm not sure if this is the correct approach.
Might I ask if anyone can see any issues with this approach and/or suggest a better one please? I have limited experience with db/web/app design so a little out of my depth here.
If saving the image locally is the correct approach, is there a 'best' way of doing this?
Thanks in advance for any help/suggestions/advice.
I can share my 2 cents of a plausible design but I think the question is too broad and is mostly opinion-based. Let's address it by taking one thing at a time.
First regarding your Book table. Why not a Library table where you maintain the current state of your library with all the books that the library has at the moment.
Each user can hold a collection (a table etc with a one to many relation like user_id to list of book_ids or whatever) and then each user sort of owns a subset of bookIDs.
When adding a new book via user or via library (library can also add more books even if no particular user brought it in) then always add it to the library and if the user_id is known for the 'owner' of this book, add a relation for this user as well in the collection table
More details of a book can be stored separately in a BookDetails table.
Storage of images on your side is always a nice option and you don't want to get blocked by the API for over-usage when requesting over and over again. You can use some cloud storage like s3 where you can keep the images and then not bother the external api. S3 supports compression and caching so you can save lots of time and not have speed problems.
All the above points are just my opinion based on the information you gave on the question. The situation can of course be different for your use-case.

Firestore generated key versus custom key in a collection?

I am using Cloud Firestore database in my Android app and I have different documents within collections like: uid for users, pushed keys for restaurants and numbers for my recipes.
My db:
users
uid1
uid2
...
resturants
pushedId1
pushedId2
...
recipes
0001
0002
...
For the users I understand to use the uid's but is better to use Firestore pushed ids for my restaurants? Is this a convention or why to use it?
I also tried to generate unique keys using UUID Class but is more easy for me to use only numbers for my recipes. Is this a bad approach?
Any help will be appreciated, thank you!
By using predictable (e.g. sequential) IDs for documents, you increase the chance you'll hit hotspots in the backend infrastructure. This decreases the scalability of the write operations.
Cloud Firestore has a built-in generator for unique IDs, that is used when you call CollectionReference.add(...) or CollectionReference.document() (without parameters). The ID that it generates is random and highly unpredictable, which prevents hitting certain hotspots in the backend infrastructure.
Using UIDs for the documents of users is a fine substitute for Firestore's built-in generator, since the UIDs already have a high level of entropy: you can't predict the UID of the next user based on knowing the current user. In such a case, using the UID (or otherwise the natural key of the entity) is a better approach, since you can perform direct lookups of the documents instead of having to query.
See this discussion on the firebase-talk mailing list where some of the engineers working on Firestore explain in more detail.
First of all there are no pushed id's in Firestore. We use the push() method in Firebase Realtime database. In Cloud Firestore we pass no argument to the document() method in order to generate a unique id for a document.
In case of users, the best unique identifier is the uid. In case of other collections like resturants, recipes or any other collection, you should consider using the id's that are generated by Firestore.
Unlike in Firebase Realtime database where there is an astronomically small chance that two users can generate a push ID at the same exact period of time and with the same exact randomness, in Cloud Firestore the IDs are actually purely random (there's no time component included).
And as an answer, you should definitely use the random keys that are generated by Firestore. Don't use simple numbers as keys for your documents.
Edit: Using sequential IDs is an anti-pattern when it comes to Firebase. Is not recommended to use this tehnique in Cloud Firestore nor in Firebase Realtime database, since it will cause scalability problems. To benefit from one of its most important features in Firestore, which is scalability, you should consider not doing this. Scalability is one of Firestore key features and it comes from how Firestore spreads the document out over its storage layer.
Using other tehniques rather than what Firestore offers, increase the hashing collisions, which means you hit write limitations in a shorter time. Having absolut random ids ensures that the writes are spread out evenly across the storage layer.

How to query a graph of documents of different types at once in Marklogic?

Background
I'm using NoSql database supporting graphs for the first time. It is a huge medical application handling thousands of patients. It is a greenfield project and we as a team are struggling with our persistence layer. We don't know how relationships should be represented and if we should use Triples to handle queries involving huge amount of data. We are using Java API.
Data structure
Imagine that there are 3 types of JSON documents in our Marklogic database: Patient, Event, File Evidence.
There are thousands of patients in the application
One patient can have multiple events associated with this patient (admitted, discharged, transferred, prescribed medications, added note, changed internal status etc.)
each event can have multiple files attached to it as an evidence
Assume there are hundreds of thousands of patients, events and files.
Question
Is it possible to query patients with events and files at once? Is using semantics (possible triples: 'patient has event', 'event has file') recommended in our case?
Our approach
We try to use triples to provide relationships between our documents, add them to one graph, use combination query to fetch IRI first and then in the second call fetch documents by IRI. We tried self-paced trainings and exploring https://github.com/marklogic/marklogic-samplestack but with no luck. Help of someone who has done that in the past and would like to share his experience would be great.
I your situation, keep in mind that you can also store the triples in each of the documents themselves (with the inferred subject being the document itself). Then in your example, you could be combining cts:triple-range-query with standard cts:search.
Example:
If I had events and embedded a triple such as [this event-> ownedByPatient -> [iri/for/patiens#12345]
Then I could query:
search for events filtered by fragments where the cts:triple-range-query states that the events are owned by patient 12345
This approach is a combination of semantics and MarkLogic search - using triples to link the appropriate types.
As for different types of documents, triples do not care what they are pointing at - an IRI of a person, event, etc. Its just about how you model you data itself and the ontology used to describe the relationships. So, you can also approach this as managed triples (not embedded) and treat it all as a graph database pointing at your content (like the approach you are describing)
Once you get further along, you may also decide to force restrictions on the types of relationships using RDF rules.
You've given us very little information to work with to answer such broad questions. Nevertheless, I'll do my best with what you gave.
One option is organize the data however is most intuitive to you, and use server-side Javascript (SJS) to combine the documents at query time into whatever you need for a particular query. That SJS could be in the form of a resource extension or search response transform. A resource extension has the advantage that it could do multiple queries across different document types and piece them together to form an answer. A search response transform, on the other hand will be given the results of only one query but could do additional queries as needed to bring in more data. Since you only have hundreds of thousands of records, you may not need to stress too much about raw speed.
If you plan to scale to millions of documents and want raw speed, you could keep everything you want to query about one patient in the patient record. That would allow you to find a patient by full-text search through all their records plus field-match on patient-specific data.
That assumes the only search results you ever want are patients. If you want something else, you'll need to let us know what other search results you might want.
When you say "attachment" I think of binary documents with scanned images, no metadata, and no full-text to search. Those would obviously be stored as separate binary documents. If they have metadata or full-text, you'll have to decide whether any of that should be in the big patient record for fast queries or in separate documents. All "attachment" documents that are separate JSON files could have a field that points to the patient by id.
I'd avoid triples at first. As David Ennis pointed out, you can combine triples and search, but it's a bit of a ninja move. One big JSON document per patient is much easier for most developers to understand.

Appengine Search API vs Datastore

I am trying to decide whether I should use App-engine Search API or Datastore for an App-engine Connected Android Project. The only distinction that the google documentation makes is
... an index search can find no more than 10,000 matching documents.
The App Engine Datastore may be more appropriate for applications that
need to retrieve very large result sets.
Given that I am already very familiar with the Datastore: Will someone please help me, assuming I don't need 10,000 results?
Are there any advantages to using the Search API versus using Datastore for my queries (per the quote above, it seems sensible to use one or the other)? In my case the end user must be able to search, update existing entries, and create new entities. For example if my app is a bookstore, the user must be able to add new books, add reviews to existing books, search for a specific book.
My data structure is such that the content will be supplied by the end user. Document vs Datastore entity: which is cheaper to update? $$, etc.
Can they supplement each other: Datastore and Search API? What's the advantage? Why would someone consider pairing the two? What's the catch/cost?
Some other info:
The datastore is a transactional system, which is important in many use cases. The search API is not. For example, you can't put and delete and document in a search index in a single transaction.
The datastore has a lot in common with a NoSql DB like Cassandra, while the search API is really a textual search engine, very similar to something like Lucene. If you understand how a reverse index works, you'll get a better understanding of how the search API works.
A very good reason to combine usage of the datastore API and the search API is that the datastore makes it very difficult to do some types of queries (e.g. free text queries, geospatial queries) that the search API handles very easily. Thus, you could store your main entities in the datastore, but then use the search API if you need to search in ways the datastore doesn't allow. Down the road, I think it would be great if the datastore and search API were more tightly integrated, for example by letting you do free text search against indexed Text fields, where app engine would automatically create a search Document Index behind the scenes for you.
The key difference is that with the Datastore you cannot search inside entities. If you have a book called "War and peace", you cannot find it if a user types "war peace" in a search box. The same with reviews, etc. Therefore, it's not really an option for you.
The most serious con of Search API is Eventual Consistency as stated here:
https://developers.google.com/appengine/docs/java/search/#Java_Consistency
It means that when you add or update a record with Search API, it may not reflect the change immediately. Imagine a case where a user upload a book or update his account setting, and nothing changes because the change hasn't gone to all servers yet.
I think Search API is only good for one thing: Search. It basically acts as a search engine for your data in Datastore.
So my advice is to keep the data in datastore that user expects immediate result, and use Search API to search the data that user won't expect immediate result.
The Datastore only provides a few query operators (=, !=, <, >), doing nested filters and multiple inequalities would either be costly or impossible (timeouts) and search results may give a lot of False Positives. You can do partial string search by tokenizing but this will bloat your entity. Best way to get through these limitations is using Structured Properties and/or Ancestor Queries.
Search API on the other hand runs a Full Text search on Search Documents, which is faster and more accurate than NDB queries without relying on tokenized data. Downside is it relies on data staying up to date.
Use Datastore to process your data (create, update, delete), then run a function to put these data as documents and cluster using indexes, then run the searches using the Search API.

Categories