Get IDs of deleted documents in firestore - java

I have a big collection with several thousand documents. These documents have subcollections with documents inside. Now I deleted a lot of the documents on the highest level.
Structure:
MyCollection => MyDocument => MySubcollection => MySubdocument
Now I realized, that the files are deleted (not showing up in any query) but the subcollections and their documents still exist. Now I am not sure how I can delete them as well, as I don't know the ID's of my deleted documents.
When I would try to find out the ID's by just sending a query to my collection to read all documents, the deleted ones are (by design) not included anymore. So how can I figure out their IDs now to delete their subcollections?
Thanks for any advice!

It all depends on your exact goal.
If you want to delete ALL the docs in the MyCollection collection, including ALL the documents in ALL the sub-collection, you can use the Firebase CLI with the following command:
firebase firestore:delete MyCollection -r
Do firebase firestore:delete --help for more options.
Of course, this can only be done by an Owner of your Firebase project.
If you want to allow other users to do the same thing from the front-end (i.e. ALL the docs, including ALL the sub-collections), you can use the technique detailed in the "Delete data with a Callable Cloud Function" section in the doc.
As explained in this doc:
You can take advantage of the firestore:delete command in the Firebase Command Line Interface (CLI). You can import any function of the Firebase CLI into your Node.js application using the firebase-tools package.
The Firebase CLI uses the Cloud Firestore REST API to find all documents under the specified path and delete them individually. This implementation requires no knowledge of your app's specific data hierarchy and will even find and delete "orphaned" documents that no longer have a parent.
If you want to delete ONLY a subset of the documents in the MyCollection collection together with the documents in the sub-collection, you can use the same methods as above, with the path to the documents, e.g.:
firestore:delete MyCollection/MyDocument -r
Finally, if your problem is that you have already deleted "parent" documents and you don't know how to delete the documents in the (orphan) sub-collections (since you don't know the ID of the parent), you could use a Collection Group query to query all the MySubcollection subcollections and detect if the parent document exists or not. The following code, in JavaScript, would do the trick.
const db = firebase.firestore();
const parentDocReferences = [];
const deletedParentDocIds = [];
db.collectionGroup('MySubcollection')
.get()
.then((querySnapshot) => {
querySnapshot.forEach((doc) => {
console.log(doc.id);
console.log(doc.ref.parent.parent.path);
parentDocReferences.push(db.doc(doc.ref.parent.parent.path).get());
});
return Promise.all(parentDocReferences);
})
.then((docSnapshots) => {
docSnapshots.forEach((doc) => {
console.log(doc.id);
console.log(doc.exists);
if (!doc.exists && deletedParentDocIds.indexOf(doc.id) === -1) {
deletedParentDocIds.push(doc.id);
}
});
// Use the deletedParentDocIds array
// For example, get all orphan subcollections reference in order to delete all the documents in those collections (see https://firebase.google.com/docs/firestore/manage-data/delete-data#collections)
deletedParentDocIds.forEach(docId => {
const orphanSubCollectionRef = db.collection(`MyCollection/${docId}/MySubcollection`);
// ...
});
});

Related

Couchbase Java SDK: N1QL queries that include document id

I'm looking to perform a query on my Couchbase database using the Java client SDK, which will return a list of results that include the document id for each result. Currently I'm using:
Statement stat = select("*").from(i("myBucket"))
.where(x(fieldIwantToGet).eq(s(valueIwantToGet)));
N1qlQueryResult result = bucket.query(stat);
However, N1qlQueryResult seems to only return a list of JsonObjects without any of the associated meta data. Looking at the documentation it seems like I want a method that returns a list of Document objects, but I can't see any bucket methods that I call that do the job.
Anyone know a way of doing this?
You need to use the below query to get Document Id:
Statement stat = select("meta(myBucket).id").from(i("myBucket"))
.where(x(fieldIwantToGet).eq(s(valueIwantToGet)));
The above would return you an array of Document Id.

Using MongoDB 3.4 to load and save userdata

How can I find a document and retrieve it if found, but insert and retrieve it if not found in one command?
I have an outline for the formats I wish my documents to look like for a user's data. Here is what it looks like
{
"username": "HeyAwesomePeople",
"uuid": "0f91ede5-54ed-495c-aa8c-d87bf405d2bb",
"global": {},
"servers": {}
}
When a user first logs in, I want to store the first two values of data (username and uuid) and create those empty values (global and servers. Both those global and servers will later on have more information filled into them, but for now they can be blank). But I also don't want to override any data if it already exists for the user.
I would normally use the insertOne or updateOne calls to the collection and then use the upsert (new UpdateOptions().upsert(true)) option to insert if it isn't found but in this case I also need to retrieve the user's document aswell.
So in a case in which the user isn't found in the database, I need to insert the outlined data into the database and return the document saved. In a case where the user is found in the database, I need to just return the document from the database.
How would I go about doing this? I am using the latest version of Mongo which has deprecated the old BasicDBObject types, so I can't find many places online that use the new 'Document' type. Also, I am using the Async driver for java and would like to keep the calls to the minimum.
How can I find a document and retrieve it if found, but insert and retrieve it if not found in one command?
You can use findOneAndUpdate() method to find and update/upsert.
The MongoDB Java driver exposes the same method name findOneAndUpdate(). For example:
// Example callback method for Async
SingleResultCallback<Document> printDocument = new SingleResultCallback<Document>() {
#Override
public void onResult(final Document document, final Throwable t) {
System.out.println(document.toJson());
}
};
Document userdata = new Document("username","HeyAwesomePeople")
.append("uuid", "0f91ede5")
.append("global", new Document())
.append("servers", new Document());
collection.findOneAndUpdate(userdata,
new Document("$set", userdata),
new FindOneAndUpdateOptions()
.upsert(true)
.returnDocument(ReturnDocument.AFTER),
printDocument);
The query above will try to find a document matching userdata; if found set it to the same value as userdata. If not found, the upsert boolean flag will insert it into the collection. The returnDocument option is to return the document after the action is performed.
The upsert and returnDocument flags are part of FindOneAndUpdateOptions
See also MongoDB Async Java Driver v3.4 for tutorials/examples. The above snippet was tested with current version of MongoDB v3.4.x.

Spark read() works but sql() throws Database not found

I'm using Spark 2.1 to read data from Cassandra in Java.
I tried the code posted in https://stackoverflow.com/a/39890996/1151472 (with SparkSession) and it worked. However when I replaced spark.read() method with spark.sql() one, the following exception is thrown:
Exception in thread "main" org.apache.spark.sql.AnalysisException: Table or view not found: `wiki`.`treated_article`; line 1 pos 14;
'Project [*]
+- 'UnresolvedRelation `wiki`.`treated_article`
at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
I'm using same spark configuration for both read and sql methods
read() code:
Dataset dataset =
spark.read().format("org.apache.spark.sql.cassandra")
.options(new HashMap<String, String>() {
{
put("keyspace", "wiki");
put("table", "treated_article");
}
}).load();
sql() code:
spark.sql("SELECT * FROM WIKI.TREATED_ARTICLE");
Spark Sql uses a Catalogue to look up database and table references. When you write in a table identifier that isn't in the catalogue it will throw errors like the one you posted. The read command doesn't require a catalogue since you are required to specify all of the relevant information in the invocation.
You can add entries to the catalogue either by
Registering DataSets as Views
First create your DataSet
spark.read().format("org.apache.spark.sql.cassandra")
.options(new HashMap<String, String>() {
{
put("keyspace", "wiki");
put("table", "treated_article");
}
}).load();
Then use one of the catalogue registry functions
void createGlobalTempView(String viewName)
Creates a global temporary view using the given name.
void createOrReplaceTempView(String viewName)
Creates a local temporary view using the given name.
void createTempView(String viewName)
Creates a local temporary view using the given name
OR Using a SQL Create Statement
CREATE TEMPORARY VIEW words
USING org.apache.spark.sql.cassandra
OPTIONS (
table "words",
keyspace "test",
cluster "Test Cluster",
pushdown "true"
)
Once added to the catalogue by either of these methods you can reference the table in all sql calls issued by that context.
Example
CREATE TEMPORARY VIEW words
USING org.apache.spark.sql.cassandra
OPTIONS (
table "words",
keyspace "test"
);
SELECT * FROM words;
// Hello 1
// World 2
The Datastax (My employer) Enterprise software automatically registers all Cassandra tables by placing entries in the Hive Metastore used by Spark as a Catalogue. This makes all tables accessible without manual registration.
This method allows for select statements to be used without an accompanying CREATE VIEW
I cannot think of a way to make that work off the top of my head. The problem lies in that Spark doesn't know the format to try, and the location that this would be specified is taken by the keyspace. The closest documentation for something like this that I can find is here in the DataFrames section of the Cassandra connector documentation. You can try to specify a using statement, but I don't think that will work inside of a select. So, your best bet beyond that is to create a PR to handle this case, or stick with the read DSL.

Couchbase 2.0 Java SDK 1.1 - Synchronous Add and Views

I am trying to create a junit test. Scenario:
setUp: I'm adding two json documents to database
Test: I'm getting those documents using view
tearDown: I'm removing both objects
My view:
function (doc, meta) {
if (doc.type && doc.type == "UserConnection") {
emit([doc.providerId, doc.providerUserId], doc.userId);
}
}
This is how I add those documents to database and make sure that "add" is synchronous:
public boolean add(String key, Object element) {
String json = gson.toJson(element);
OperationFuture<Boolean> result = couchbaseClient.add(key, 0, json);
return result.get();
}
JSON Documents that I'm adding are:
{"userId":"1","providerId":"test_pId","providerUserId":"test_pUId","type":"UserConnection"}
{"userId":"2","providerId":"test_pId","providerUserId":"test_pUId","type":"UserConnection"}
This is how I call the view:
View view = couchbaseClient.getView(DESIGN_DOCUMENT_NAME, VIEW_NAME);
Query query = new Query();
query.setKey(ComplexKey.of("test_pId", "test_pUId"));
ViewResponse viewResponse = couchbaseClient.query(view, query);
Problem:
Test fails due to invalid number of elements fetched from view.
My observations:
Sometimes tests are passing
Number of elements that are fetched from view is not consistent(from 0 to 2)
When I've added those documents to database instead of calling setUp the test passed every time
Acording to this http://www.couchbase.com/docs/couchbase-sdk-java-1.1/create-update-docs.html documentation I'm adding those json documents synchronously by calling get() on returned Future object.
My question:
Is there something wrong with how I've approached to fetching data from view just after this data was inserted to DB? Is there any good practise for solving this problem? And can someone explain it to me please what I've did wrong?
Thanks,
Dariusz
In Couchbase 2.0 documents are required to be written to disk before they will show up in a view. There are three ways you can do an operation with the Java SDK. The first is asynchronous which means that you just send the data and at a later time check to make sure that the data was received correctly. If you do an asynchronous operation and then immediately call .get() as you did above then you have created a synchronous operation. When an operation returns success in these two cases above you are only guaranteed that the item has been written into memory. Your test passed sometimes only because you were lucky enough that both items were written to disk before did your query.
The third way to do an operation is with durability requirements and this is the one you want to do for your tests. Durability requirements allow you to say that you want an item to be written to disk or replicated before success is returned to the client. Take a look at the following function.
https://github.com/couchbase/couchbase-java-client/blob/1.1.0/src/main/java/com/couchbase/client/CouchbaseClient.java#L1293
You will want to use this function and set the PersistedTo parameter to MASTER.

How can I bulk update my mongo data to add a fixed level?

I recently changed a POJO from having all its typed properties to something free in a typed JSONObject field called content.
The problem is that all old documents map to the old POJO version, so they are stored like this:
{"_id":"ObjectId(value)","field1":"value1","field2":"value2"}
Can I update all fields via a single mongo command so I can wrap all the content, except the id, so the result would be something like this:
{"_id":"ObjectId(value)","content":{"field1":"value1","field2":"value2"}}
?
Or should I program a simple program that does it one by one? (as in iterating all values sort of manually adding the new content level)
Unfortunately, there are no MongoDB commands that will allow you to restructure a document in this way. You'll need to write a program to fetch all of your documents one by one, update the structure, and then send the updated structure back to MongoDB.
Often the best way to do this is to write the modified documents to a new collection, and then drop the old collection when you're done.
I solved it creating a .js file to execute via mongo shell.
mongo myDb fixresults.js
The file is as follows:
for( var c = db.results.find(); c.hasNext(); ) {
var full = c.next();
var anon = db.results.findOne({"_id":full._id},{"_id":0});
var n = {"_id":full._id,"content":anon};
db.results.temp.insert(n);
}
This will insert the transformed value into the .temp collection, which you can rename later to replace the original.

Categories