Retrieving the UUID from Apache Solr after a commit - java

I am using Solrj to add new documents to a Solr instance. In my document schema the id is a UUID (solr.UUIDField). Each time a document is created the id is filled with the unique id, which is exactly what I want. Sometimes it's necessary in my application that I can retrieve this unique id to add it as a field value when inserting another document. So my question is, how can I retrieve this generated uuid from solr after adding a document?
Solrj returns me this UpdateResponse object after commiting, but I don't know how to get the new uuid out of it.
I am adding a document like this
CommonsHttpSolrServer server = new CommonsHttpSolrServer(MY_SERVER_URL);
SolrInputDocument doc = new SolrInputDocument();
// [...] multiple addField calls
server.add(doc);
UpdateResponse ur = server.commit();

AFAIK you aren't going to ever get a UUID from an add or a commit. When you do an add or commit, the update request handler gives you back query time and status, but not much else (assuming it is successful). You can actually see what is in the HTTP response by running a manual add/commit like so:
http://localhost:8983/solr/update?stream.body=<add><doc><field name="id">test</field><field name="title">test title</field></doc></add>
http://localhost:8983/solr/update?stream.body=<commit/>
If you run those queries in a web browser, they will submit a test document and commit it, respectively. You will then be able to see what information is available to SolrJ (not much).
You could write your own (modified) update handler in Java, but that seems like a ton of work. You could also enable the "timestamp" field in your Solr schema so you can query solr by last modified date and find the items you just committed.
Both of those methods would be major hacks, though. Your best bet is to figure out a unique ID for your documents before you submit them to Solr, then use that unique ID to retrieve them. Using a generated UUID is more of a "fire and forget about this" method. Since you don't want to forget, you will need to generate your own UUID.
Since you're using Java, it should be dead simple to do with UUID, using some code like this:
CommonsHttpSolrServer server = new CommonsHttpSolrServer(MY_SERVER_URL);
SolrInputDocument doc = new SolrInputDocument();
UUID uuid = UUID.randomUUID();
doc.addField("id", uuid.toString());
// [...] multiple addField calls
server.add(doc);
UpdateResponse ur = server.commit();

Related

Lucidworks Fusion 4.1 transform result documents using Javascript query pipeline

How can I transform solr response using JavaScript query Pipeline in Lucidworks Fusion 4.1? For example I have the following response:
[
{ "doc_type":"type1",
"publicationDate":"2018/10/10",
"sortDate":"2017/9/9"},
{ "doc_type":"type2",
"publicationDate":"2018/5/5",
"sortDate":"2017/12/12"}]
And I need to change it with the following conditions:
If doc_type = type1 then put sortDate in publicationDate and remove sortDate; else only remove sortDate
How can I manipulate with response? There is no documentation in official website
Currently, you cannot modify the Solr response. All you can do is add to it. So you could add a new block of JSON, include the "id" of the item and then list the fields and values you want to use in your UI.
Otherwise, you need to make the change in your Index Pipeline (as long as the value doesn't need to change based on the query).

Couchbase Java SDK: N1QL queries that include document id

I'm looking to perform a query on my Couchbase database using the Java client SDK, which will return a list of results that include the document id for each result. Currently I'm using:
Statement stat = select("*").from(i("myBucket"))
.where(x(fieldIwantToGet).eq(s(valueIwantToGet)));
N1qlQueryResult result = bucket.query(stat);
However, N1qlQueryResult seems to only return a list of JsonObjects without any of the associated meta data. Looking at the documentation it seems like I want a method that returns a list of Document objects, but I can't see any bucket methods that I call that do the job.
Anyone know a way of doing this?
You need to use the below query to get Document Id:
Statement stat = select("meta(myBucket).id").from(i("myBucket"))
.where(x(fieldIwantToGet).eq(s(valueIwantToGet)));
The above would return you an array of Document Id.

Get the object id after inserting the mongodb document in java

I am using mongodb 3.4 and I want to get the last inserted document id. I have searched all and I found out below code can be used if I used a BasicDBObject.
BasicDBObject docs = new BasicDBObject(doc);
collection.insertOne(docs);
ID = (ObjectId)doc.get( "_id" );
But the problem is am using Document type not BasicDBObject so I tried to get it as like this, doc.getObjectId();. But it asks a parameter which I actually I want, So does anyone know how to get it?
EDIT
This is the I am inserting it to mongo db.
Document doc = new Document("jarFileName", jarDataObj.getJarFileName())
.append("directory", jarDataObj.getPathData())
.append("version", jarDataObj.getVersion())
.append("artifactID", jarDataObj.getArtifactId())
.append("groupID", jarDataObj.getGroupId());
If I use doc.toJson() it shows me whole document. is there a way to extract only _id?
This gives me only the value i want it like the objectkey, So I can use it as reference key.
collection.insertOne(doc);
jarID = doc.get( "_id" );
System.out.println(jarID); //59a4db1a6812d7430c3ef2a5
Based on ObjectId Javadoc, you can simply instantiate an ObjectId from a 24 byte Hex string, which is what 59a4db1a6812d7430c3ef2a5 is if you use UTF-8 encoding. Why don't you just do new ObjectId("59a4db1a6812d7430c3ef2a5"), or new ObjectId("59a4db1a6812d7430c3ef2a5".getBytes(StandardCharsets.UTF_8))? Although, I'd say that exposing ObjectId outside the layer that integrates with Mongo is a design flaw.

Using MongoDB 3.4 to load and save userdata

How can I find a document and retrieve it if found, but insert and retrieve it if not found in one command?
I have an outline for the formats I wish my documents to look like for a user's data. Here is what it looks like
{
"username": "HeyAwesomePeople",
"uuid": "0f91ede5-54ed-495c-aa8c-d87bf405d2bb",
"global": {},
"servers": {}
}
When a user first logs in, I want to store the first two values of data (username and uuid) and create those empty values (global and servers. Both those global and servers will later on have more information filled into them, but for now they can be blank). But I also don't want to override any data if it already exists for the user.
I would normally use the insertOne or updateOne calls to the collection and then use the upsert (new UpdateOptions().upsert(true)) option to insert if it isn't found but in this case I also need to retrieve the user's document aswell.
So in a case in which the user isn't found in the database, I need to insert the outlined data into the database and return the document saved. In a case where the user is found in the database, I need to just return the document from the database.
How would I go about doing this? I am using the latest version of Mongo which has deprecated the old BasicDBObject types, so I can't find many places online that use the new 'Document' type. Also, I am using the Async driver for java and would like to keep the calls to the minimum.
How can I find a document and retrieve it if found, but insert and retrieve it if not found in one command?
You can use findOneAndUpdate() method to find and update/upsert.
The MongoDB Java driver exposes the same method name findOneAndUpdate(). For example:
// Example callback method for Async
SingleResultCallback<Document> printDocument = new SingleResultCallback<Document>() {
#Override
public void onResult(final Document document, final Throwable t) {
System.out.println(document.toJson());
}
};
Document userdata = new Document("username","HeyAwesomePeople")
.append("uuid", "0f91ede5")
.append("global", new Document())
.append("servers", new Document());
collection.findOneAndUpdate(userdata,
new Document("$set", userdata),
new FindOneAndUpdateOptions()
.upsert(true)
.returnDocument(ReturnDocument.AFTER),
printDocument);
The query above will try to find a document matching userdata; if found set it to the same value as userdata. If not found, the upsert boolean flag will insert it into the collection. The returnDocument option is to return the document after the action is performed.
The upsert and returnDocument flags are part of FindOneAndUpdateOptions
See also MongoDB Async Java Driver v3.4 for tutorials/examples. The above snippet was tested with current version of MongoDB v3.4.x.

Elasticsearch refreshing index automatically with index.refresh=-1?

I am using Elasticsearch with the Java API.
I am indexing offline data with big bulk inserts, so I set index.refresh=-1
I don't refresh the index "manually" anywhere.
It seems that refresh is done at some point, because queries do return data. The only scenario where the data wasn't returned was when I tested with just a few documents, and querying was done immediately after insertion (using the same Client object).
I wonder if index refresh is called implicitly by Elasticsearch or by the Java library at some stage, even when index.refresh=-1 ?
Or how else could the behavior be explained?
Client generation:
Client client = TransportClient.builder().settings(settings)
.build()
.addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName(address),port));
Insertion:
BulkRequestBuilder bulkRequest = client.prepareBulk();
for (MyObject object : list) {
bulkRequest.add(client.prepareIndex(index, type)
.setSource(XContentFactory.jsonBuilder()
.startObject()
// ... add object fields here ...
.endObject()
));
}
BulkResponse bulkResponse = bulkRequest.get();
Querying:
QueryBuilder query = ...;
SearchResponse resp = client.prepareSearch(index)
.setQuery(query)
.setSize(Integer.MAX_VALUE)
// adding fields here
.get();
SearchHit[] = resp.getHits().getHits();
The reason the documents were searchable despite refresh interval being disabled could be either due to index-buffer filling up resulting in creation of lucene segment or translog being full resulting in commit of lucene segment either of which makes the documents searchable.
As per the documentation
By default, Elasticsearch uses memory heuristics in order to
automatically trigger flush operations as required in order to clear
memory.
Also the index buffer settings can be manipulated as follows.
This article is a good read with regard to how data is searchable and durable.
You can also look at this SO thread written by one of elasticsearch contributers for more details between flush vs refresh.
You can use indices-stats to verify all this i.e verify if there was a flush or refresh
Example :
GET <index_name>/_stats/refresh
GET <index_name>/_stats/flush

Categories