We using elastic 7.13
we are doing periodical update to index using upsert
The sequence of operations
create new index with dynamic mapping all strings mapped as text
"dynamic_templates": [
{
"strings_as_keywords": {
"match_mapping_type": "string",
"mapping": {
"type": "text",
"analyzer": "autocomplete",
"search_analyzer": "search_term_analyzer",
"copy_to": "_all",
"fields": {
"keyword": {
"type": "keyword",
"normalizer": "lowercase_normalizer"
}
}
}
}
}
]
upsert bulk with the attached code (I don't have equivalent with rest)
doing search on specific filed
localhost:9200/mdsearch-vitaly123/_search
{
"query": {
"match": {
"fullyQualifiedName": `value_test`
}
}
}
got 1 result
upsert again now "fullyQualifiedName": "value_test1234" (as in step 2)
do search as in step 3
got 2 results 1 doc with "fullyQualifiedName": "value_test"
and other "fullyQualifiedName": "value_test1234"
snippet below of upsert (step 2):
#Override
public List<BulkItemStatus> updateDocumentBulk(String indexName, List<JsonObject> indexDocuments) throws MDSearchIndexerException {
BulkRequest request = new BulkRequest().setRefreshPolicy(WriteRequest.RefreshPolicy.IMMEDIATE);
ofNullable(indexDocuments).orElseThrow(NullPointerException::new)
.forEach(x -> {
var id = x.get("_id").getAsString();
x.remove("_id");
request.add(new UpdateRequest(indexName, id)
.docAsUpsert(true)
.doc(x.toString(), XContentType.JSON)
.retryOnConflict(3)
);
});
BulkResponse bulk = elasticsearchRestClient.bulk(request, RequestOptions.DEFAULT);
return stream(bulk.getItems())
.map(r -> new BulkItemStatus(r.getId(), isSuccess(r), r.getFailureMessage()))
.collect(Collectors.toList());
}
I can search by updated properties.
But the problem is that searches retrieve "updated fields" and previous one as well.
How can I solve it ?
maybe limit somehow the version number to be only 1.
I set setRefreshPolicy(WriteRequest.RefreshPolicy.IMMEDIATE) but it didn't helped
Here in picture we can see result
P.S - old and updated data retrieved as well
Suggestions ?
Regards,
What is happening is that the following line must yield null:
var id = x.get("_id").getAsString();
In other words, there is no _id field in the JSON documents you pass in indexDocuments. It is not allowed to have fields with an initial underscore character in the source documents. If it was the case, you'd get the following error:
Field [_id] is a metadata field and cannot be added inside a document. Use the index API request parameters.
Hence, your update request cannot update any document (since there's no ID to identify the document to update) and will simply insert a new one (i.e. what docAsUpsert does), which is why you're seeing two different documents.
Related
I'm facing a weird problem while using Dataflow Streaming Insert.
I have a JSON with a lot of records and arrays. I set up the Pipeline with Streaming Insert method and a class DeadLetters to handle the errors.
formattedWaiting.apply("Insert Bigquery ",
BigQueryIO.<KV<TableRow,String>>write()
.to(customOptions.getOutputTable())
.withFormatFunction(kv -> kv.getKey())
.withMethod(BigQueryIO.Write.Method.STREAMING_INSERTS)
.withSchemaFromView(schema)
.withCreateDisposition(CreateDisposition.CREATE_IF_NEEDED)
.withWriteDisposition(WriteDisposition.WRITE_APPEND)
.withFailedInsertRetryPolicy(InsertRetryPolicy.retryTransientErrors())
.withoutValidation()
.withExtendedErrorInfo()
.withTimePartitioning(new TimePartitioning().setField(customOptions.getPartitionField().get()))
.withClustering(clusteringFieldsList)
.withExtendedErrorInfo())
.getFailedInsertsWithErr()
.apply("Taking 1 element insertion", Sample.<BigQueryInsertError>any(1))
.apply("Insertion errors",ParDo.of(new DeadLettersHandler()));
The problem is when I'm using the streaming insert method, some rows don't insert into the table and I'm receiving the error:
Repeated record with name: XXXX added outside of an array.
I double-checked the JSON that has the problem and everything seems fine.
The weird part is when I comment the withMethod line, the row insert with no issue at all.
I don't know why the pipeline has that behavior.
The JSON looks like this.
{
"parameters":{
"parameter":[
{
"subParameter":[
{
"value":"T",
"key":"C"
},
{
"value":"1",
"key":"SEQUENCE_NUMBER"
},
{
"value":"1",
"key":"SEQUENCE_NUMBER"
}
],
"value":"C",
"key":"C"
},
{
"subParameter":[
{
"value":"T",
"key":"C"
},
{
"value":"1",
"key":"SEQUENCE_NUMBER"
},
{
"value":"2",
"key":"SEQUENCE_NUMBER"
}
],
"value":"C",
"key":"C"
}
]
}
}
The BigQuery schema is fine because I can insert data while commenting the streaming insert line in the BigQueryIO
Any idea fellows?
Thanks in advance!
Just an update to this question.
The problem was with the schema declaration and the JSON itself.
We defined the parameters column as RECORD REPEATED but parameters is an object in the JSON example.
So we have two options here.
Change the BigQuery schema from RECORD REPEATED to RECORD NULLABLE
Add a bracket [] to the parameters object, for this option you will have to transform the JSON and add the brackets to treat the object as an array.
Example:
{
"parameters":[
{
"parameter":[
{
"subParameter":[
{
"value":"T",
"key":"C"
},
{
"value":"1",
"key":"SEQUENCE_NUMBER"
},
{
"value":"1",
"key":"SEQUENCE_NUMBER"
}
],
"value":"C",
"key":"C"
}
]
}
]
}
Hi I want to fetch data from couchdb-view by applying reduce and pagination.
My view gives reduce function result as complex key as follows
{"rows":[
{"key":{"attribute":"Attribute1"},"value":20},
{"key":{"attribute":"Attribute2"},"value":1}
{"key":{"attribute":"Attribute3"},"value":1}
]}
I am trying to fetch data from couchdb using ektorp, check following code
PageRequest pageRequest = PageRequest.firstPage(10);
ViewQuery query = new ViewQuery()
.designDocId("_design/medesign")
.viewName("viewname")
.includeDocs(false)
.reduce(true)
.group(true);
Page<ViewResult> rs1 = db.queryForPage(query, pageRequest, ViewResult.class);
rs1.forEach(v -> {
System.out.println(v.getSize());
});
I am getting following error
org.ektorp.DbAccessException: com.fasterxml.jackson.databind.JsonMappingException:
Can not construct instance of org.ektorp.ViewResult:
no int/Int-argument constructor/factory method to deserialize from Number value (20)
at [Source: N/A; line: -1, column: -1]
CouchDB doesn't Give paginated details if you want paginated reduced data.
Request with paginated include docs
group=false & reduce=false & include_docs=true
URL : http://localhost:5984/dn_anme/_design/design_name/_view/viewname?include_docs=true&reduce=false&skip=0&group=false&limit=2
Response :
{
"total_rows":81,
"offset":0,
"rows":[
{
"id":"906a74b8019716f1240a7117580ec172",
"key":{
"attribute":"BuildArea"
},
"value":1,
"doc":{
"_id":"906a74b8019716f1240a7117580ec172",
"_rev":"3-7e0a1da0c2260040f8a9787636385785",
"country":"POL",
"recordStatus":"MATCHED"
}
},
{
"id":"906a74b8019716f1240a7117580eaefb",
"key":{
"attribute":"Area",
},
"value":1,
"doc":{
"_id":"906a74b8019716f1240a7117580eaefb",
"_rev":"3-165ea3a3ed07ad8cce1f3e095cd476b5",
"country":"POL",
"recordStatus":"MATCHED"
}
}
]
}
Request with Reduce
group=true& reduce=true& include_docs=false
URL : http://localhost:5984/dn_anme/_design/design_name/_view/viewname?include_docs=false&reduce=true&group=true&limit=2
Resoonse :
{
"rows":[
{
"key":[
"BuildArea"
],
"value":1
},
{
"key":[
"Area"
],
"value":1
}
]
}
Difference in between both Request:
Request with paginated include docs gives page data {"total_rows":81, "offset":0, rows":[{...},{...}]}
AND
Request with reduce give {"rows":[{...},{..}]}
How you can get paginated reduce data:
Step 1: Request rows_per_page + 1 rows from the view
Step 2: if in response one extra records than page_size then there are more records
Step 3: calculate and update skip value and got to step 1 for next page
Note: adding skip is not good option for lots of records instead of that find start key and add start key, its good for better perforamance
I am working in Cloudera Manager Navigator REST API where extracting result is working fine, but unable to get any nested value.
The type of data is extracting as below.
{
"parentPath": "String",
"customProperties": "Map[string,string]",
"sourceType": "String",
"entityType": "String"
}
And data should be like
{
"parentPath": "abcd",
"customProperties": {
"nameservice" : "xyz"
},
"sourceType": "rcs",
"entityType": "ufo"
}
But I am getting key-value result as follows.
parentPath :abcd
customProperties : null
sourceType : rcs
entityType : ufo
In above response data, "customProperties" is coming with a null value where it should return a map object contains ["nameservice" : "xyz"]. This is the problem with following code snippet.
MetadataResultSet metadataResultSet = extractor.extractMetadata(null, null,"sourceType:HDFS", "identity:*");
Iterator<Map<String, Object>> entitiesIt = metadataResultSet.getEntities().iterator();
while(entitiesIt.hasNext()){
Map<String, Object> result = entitiesIt.next();
for(String data : result.keySet()){
System.out.println(" key:"+data+" value:"+result.get(data));
}
}
Can you suggest me how to get the nested value where datatype is complex.
have u checked how the data looks on navigator ui? You can first verify that once, and also try cloudera /entities/entity-id rest API in browser to check how json response is coming
I am using Java API for CRUD operation on elasticsearch.
I have an typewith a nested field and I want to update this field.
Here is my mapping for the type:
"enduser": {
"properties": {
"location": {
"type": "nested",
"properties":{
"point":{"type":"geo_point"}
}
}
}
}
Of course my enduser type will have other parameters.
Now I want to add this document in my nested field:
"location":{
"name": "London",
"point": "44.5, 5.2"
}
I was searching in documentation on how to update nested document but I couldn't find anything. For example I have in a string the previous JSON obect (let's call this string json). I tried the following code but seems to not working:
params.put("location", json);
client.prepareUpdate(index, ElasticSearchConstants.TYPE_END_USER,id).setScript("ctx._source.location = location").setScriptParams(params).execute().actionGet();
I have got a parsing error from elasticsearch. Anyone knows what I am doing wrong ?
You don't need the script, just update it.
UpdateRequestBuilder br = client.prepareUpdate("index", "enduser", "1");
br.setDoc("{\"location\":{ \"name\": \"london\", \"point\": \"44.5,5.2\" }}".getBytes());
br.execute();
I tried to recreate your situation and i solved it by using an other way the .setScript method.
Your updating request now would looks like :
client.prepareUpdate(index, ElasticSearchConstants.TYPE_END_USER,id).setScript("ctx._source.location =" + json).execute().actionGet()
Hope it will help you.
I am not sure which ES version you were using, but the below solution worked perfectly for me on 2.2.0. I had to store information about named entities for news articles. I guess if you wish to have multiple locations in your case, it would also suit you.
This is the nested object I wanted to update:
"entities" : [
{
"disambiguated" : {
"entitySubTypes" : [],
"disambiguatedName" : "NameX"
},
"frequency" : 1,
"entityType" : "Organization",
"quotations" : ["...", "..."],
"name" : "entityX"
},
{
"disambiguated" : {
"entitySubType" : ["a", "b" ],
"disambiguatedName" : "NameQ"
},
"frequency" : 5,
"entityType" : "secondTypeTest",
"quotations" : [ "...", "..."],
"name" : "entityY"
}
],
and this is the code:
UpdateRequest updateRequest = new UpdateRequest();
updateRequest.index(indexName);
updateRequest.type(mappingName);
updateRequest.id(url); // docID is a url
XContentBuilder jb = XContentFactory.jsonBuilder();
jb.startObject(); // article
jb.startArray("entities"); // multiple entities
for ( /*each namedEntity*/) {
jb.startObject() // entity
.field("name", name)
.field("frequency",n)
.field("entityType", entityType)
.startObject("disambiguated") // disambiguation
.field("disambiguatedName", disambiguatedNameStr)
.field("entitySubTypes", entitySubTypeArray) // multi value field
.endObject() // disambiguation
.field("quotations", quotationsArray) // multi value field
.endObject(); // entity
}
jb.endArray(); // array of nested objects
b.endObject(); // article
updateRequest.doc(jb);
Blblblblblblbl's answer couldn't work for me atm, because scripts are not enabled in our server. I didn't try Bask's answer yet - Alcanzar's gave me a hard time, because I supposedly couldn't formulate the json string correctly that setDoc receives. I was constantly getting errors that either I am using objects instead of fields or vice versa. I also tried wrapping the json string with doc{} as indicated here, but I didn't manage to make it work. As you mentioned it is difficult to understand how to formulate a curl statement at ES's java API.
A simple way to update the arraylist and object value using Java API.
UpdateResponse update = client.prepareUpdate("indexname","type",""+id)
.addScriptParam("param1", arrayvalue)
.addScriptParam("param2", objectvalue)
.setScript("ctx._source.field1=param1;ctx._source.field2=param2").execute()
.actionGet();
arrayvalue-[
{
"text": "stackoverflow",
"datetime": "2010-07-27T05:41:52.763Z",
"obj1": {
"id": 1,
"email": "sa#gmail.com",
"name": "bass"
},
"id": 1,
}
object value -
"obj1": {
"id": 1,
"email": "sa#gmail.com",
"name": "bass"
}
I'm making a MongoDB statistic system using the Java driver, and I am wondering if it is possible (and how) to change the value of a key nested inside many objects. Here is how my database is formatted:
{
location : “chicago”,
stats : [
{
"employee" : "rob",
"stat1" : 1,
"stat2" : 3,
"stat3" : 2
},
{
"employee" : "krista",
"stat1" : 1,
"stat2" : 3,
"stat3" : 2
}
]
}
So, for example, how could I change Rob's "stat2" to another value? I am new to JSON and the MongoDB Java driver. Any help is appreciated!
You need to use the positional $ operator and $set in order to update what you want.
db.collection.update(
{ _id: <docId>, "stats.employee": "rob" },
{ "$set": { "stats.$.stat2": <value> } }
)
So you match your document and the required element of the array. The update side uses that array index to know in which element to update. The $set operator only updates the specified field.
In Java, Build with BasicDBObject.
BasicDBObject query = new BasicDBObject("_id", id);
query.append( new BasicDBObject("stats.employee", "rob") );
BasicDBObject update = new BasicDBObject("$set",
new BasicDBObject("stats.$.stat2", value));
collection.update(query,update);