Automatic field creation for Bulk Write operations mongodb

Automatic field creation for Bulk Write operations mongodb - java

Hi i need to execute bulk insert with automatic field createdon [ timestamp ] ,
as of now i am manually creating createdon timestamp and set it into document and then calling insert method.
for inserting single document this is ok,
But this is not efficient for executing bulkinsert with lacks of records ,
Can anyone show how to execute bulkInsert data with automatic field
or suggest some better solution for creating automatic field in mongodb
Actual Data for BulkWrite insert:
BulkInsertList:
[
{
fieldOne:val,
fieldTwo:val,
fieldThree:val,
fieldFour:val
},
{
fieldOne:val,
fieldTwo:val,
fieldThree:val,
fieldFour:val
},
{
..
}
..,
]
Bulk write should Insert data with createdOn timestamp,
expected result,
BulkInsertList:
[
{
fieldOne:val,
fieldTwo:val,
fieldThree:val,
fieldFour:val,
createdOn: Timestamp
},
{
..
},
{
..
}
]

Related

How to isnset in recored with one-to-many relationship using jdbcTemplate?

I wark on spring rest API project where I have to insert Model and this model has realtion with other table (one-to-many) and I am using jdbcTemaple for some resons.
The api client send json object and it have list of the reated object.
How I can Insert this object in DB.
for eaxample client send this DTO:
{
"name": "Join"
"courses: [
{
"id" : 1,
"courseName":"Phisics"
},
{
"id" : 2,
"courseName":"Chemistry"
}
]

BigQuery Streaming Insert Error - Repeated record added outside of an array

I'm facing a weird problem while using Dataflow Streaming Insert.
I have a JSON with a lot of records and arrays. I set up the Pipeline with Streaming Insert method and a class DeadLetters to handle the errors.
formattedWaiting.apply("Insert Bigquery ",
BigQueryIO.<KV<TableRow,String>>write()
.to(customOptions.getOutputTable())
.withFormatFunction(kv -> kv.getKey())
.withMethod(BigQueryIO.Write.Method.STREAMING_INSERTS)
.withSchemaFromView(schema)
.withCreateDisposition(CreateDisposition.CREATE_IF_NEEDED)
.withWriteDisposition(WriteDisposition.WRITE_APPEND)
.withFailedInsertRetryPolicy(InsertRetryPolicy.retryTransientErrors())
.withoutValidation()
.withExtendedErrorInfo()
.withTimePartitioning(new TimePartitioning().setField(customOptions.getPartitionField().get()))
.withClustering(clusteringFieldsList)
.withExtendedErrorInfo())
.getFailedInsertsWithErr()
.apply("Taking 1 element insertion", Sample.<BigQueryInsertError>any(1))
.apply("Insertion errors",ParDo.of(new DeadLettersHandler()));
The problem is when I'm using the streaming insert method, some rows don't insert into the table and I'm receiving the error:
Repeated record with name: XXXX added outside of an array.
I double-checked the JSON that has the problem and everything seems fine.
The weird part is when I comment the withMethod line, the row insert with no issue at all.
I don't know why the pipeline has that behavior.
The JSON looks like this.
{
"parameters":{
"parameter":[
{
"subParameter":[
{
"value":"T",
"key":"C"
},
{
"value":"1",
"key":"SEQUENCE_NUMBER"
},
{
"value":"1",
"key":"SEQUENCE_NUMBER"
}
],
"value":"C",
"key":"C"
},
{
"subParameter":[
{
"value":"T",
"key":"C"
},
{
"value":"1",
"key":"SEQUENCE_NUMBER"
},
{
"value":"2",
"key":"SEQUENCE_NUMBER"
}
],
"value":"C",
"key":"C"
}
]
}
}
The BigQuery schema is fine because I can insert data while commenting the streaming insert line in the BigQueryIO
Any idea fellows?
Thanks in advance!

Just an update to this question.
The problem was with the schema declaration and the JSON itself.
We defined the parameters column as RECORD REPEATED but parameters is an object in the JSON example.
So we have two options here.
Change the BigQuery schema from RECORD REPEATED to RECORD NULLABLE
Add a bracket [] to the parameters object, for this option you will have to transform the JSON and add the brackets to treat the object as an array.
Example:
{
"parameters":[
{
"parameter":[
{
"subParameter":[
{
"value":"T",
"key":"C"
},
{
"value":"1",
"key":"SEQUENCE_NUMBER"
},
{
"value":"1",
"key":"SEQUENCE_NUMBER"
}
],
"value":"C",
"key":"C"
}
]
}
]
}

Elasticsearch 7.13 - elastic search response with old data after update api

We using elastic 7.13
we are doing periodical update to index using upsert
The sequence of operations
create new index with dynamic mapping all strings mapped as text
"dynamic_templates": [
{
"strings_as_keywords": {
"match_mapping_type": "string",
"mapping": {
"type": "text",
"analyzer": "autocomplete",
"search_analyzer": "search_term_analyzer",
"copy_to": "_all",
"fields": {
"keyword": {
"type": "keyword",
"normalizer": "lowercase_normalizer"
}
}
}
}
}
]
upsert bulk with the attached code (I don't have equivalent with rest)
doing search on specific filed
localhost:9200/mdsearch-vitaly123/_search
{
"query": {
"match": {
"fullyQualifiedName": `value_test`
}
}
}
got 1 result
upsert again now "fullyQualifiedName": "value_test1234" (as in step 2)
do search as in step 3
got 2 results 1 doc with "fullyQualifiedName": "value_test"
and other "fullyQualifiedName": "value_test1234"
snippet below of upsert (step 2):
#Override
public List<BulkItemStatus> updateDocumentBulk(String indexName, List<JsonObject> indexDocuments) throws MDSearchIndexerException {
BulkRequest request = new BulkRequest().setRefreshPolicy(WriteRequest.RefreshPolicy.IMMEDIATE);
ofNullable(indexDocuments).orElseThrow(NullPointerException::new)
.forEach(x -> {
var id = x.get("_id").getAsString();
x.remove("_id");
request.add(new UpdateRequest(indexName, id)
.docAsUpsert(true)
.doc(x.toString(), XContentType.JSON)
.retryOnConflict(3)
);
});
BulkResponse bulk = elasticsearchRestClient.bulk(request, RequestOptions.DEFAULT);
return stream(bulk.getItems())
.map(r -> new BulkItemStatus(r.getId(), isSuccess(r), r.getFailureMessage()))
.collect(Collectors.toList());
}
I can search by updated properties.
But the problem is that searches retrieve "updated fields" and previous one as well.
How can I solve it ?
maybe limit somehow the version number to be only 1.
I set setRefreshPolicy(WriteRequest.RefreshPolicy.IMMEDIATE) but it didn't helped
Here in picture we can see result
P.S - old and updated data retrieved as well
Suggestions ?
Regards,

What is happening is that the following line must yield null:
var id = x.get("_id").getAsString();
In other words, there is no _id field in the JSON documents you pass in indexDocuments. It is not allowed to have fields with an initial underscore character in the source documents. If it was the case, you'd get the following error:
Field [_id] is a metadata field and cannot be added inside a document. Use the index API request parameters.
Hence, your update request cannot update any document (since there's no ID to identify the document to update) and will simply insert a new one (i.e. what docAsUpsert does), which is why you're seeing two different documents.

Change values of respective columns in InfluxDB

I have below given sampleMeasurement1; I want to update values of respective columns in InfluxDB. How to update those values?
SELECT * FROM sampleMeasurement1;
{ "results": [ { "series": [ { "name": "sampleMeasurement1", "columns": [ "time", "disk_type", "field1", "field2", "hostname" ], "values": [ [ 1520315870774000000, null, 12212, 22.44, "server001" ], [ 1520315870843000000, "HDD", 112, 21.44, "localhost" ] ] } ] } ] }

We can't change tag values via InfluxDB commands, we can however write a client script that can change the value of a tag by inserting "duplicate" points in the measurement with the same timestamp, fieldset and tagset, except that the desired tag will have its value changed.
Point with wrong tag ( https://docs.influxdata.com/influxdb/v1.4/write_protocols/line_protocol_reference/#syntax ):
cpu,hostname=machine.lan cpu=50 1514970123
After running
INSERT cpu,hostname=machine.mydomain.com cpu=50 1514970123
a SELECT * FROM CPU would include
cpu,hostname=machine.lan cpu=50 1514970123
cpu,hostname=machine.mydomain.com cpu=50 1514970123
After the script runs all the INSERT commands, you'll need to drop the obsolete series of points with the old tag value:
DROP SERIES FROM cpu WHERE hostname='machine.lan'
Change tag value in InfluxDB

Mongodb updating and setting a field in embedded document

I have a collection with embedded documents.
System
{
System_Info: "automated",
system_type:
{
system_id:1,
Tenant: [
{
Tenant_Id: 1,
Tenant_Info: "check",
Prop_Info: ...
},
{
Tenant_Id: 2,
Tenant_Info: "sucess",
Prop_Info: ...
} ]
}}
I need to update and set the field Tenant_Info to "failed" in Tenant_Id: 2
I need to do it using mongodb java. I know to insert another tenant information in the tenant array. But here I need to set the field using java code.
Could anyone help me to do this?

How about something like this (untested):
db.coll.update(
{
"System.system_type.Tenant.Tenant_Id" : 2
},
{
$set : {
"System.system_type.Tenant.$.Tenant_Info" : "failed"
}
},
false,
true
);
It should update the first nested document in the collection with a Tenant_id of 2 for all top level documents. If you need to target a specific top level document, you need to add it to the as field on the first object argument in the update call.
And the equivalent in Java:
BasicDBObject find = new BasicDBObject(
"System.system_type.Tenant.Tenant_Id", 2
);
BasicDBObject set = new BasicDBObject(
"$set",
new BasicDBObject("System.system_type.Tenant.$.Tenant_Info", "failed")
);
coll.update(find, set);

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Automatic field creation for Bulk Write operations mongodb - java

Related

How to isnset in recored with one-to-many relationship using jdbcTemplate?

BigQuery Streaming Insert Error - Repeated record added outside of an array

Elasticsearch 7.13 - elastic search response with old data after update api

Change values of respective columns in InfluxDB

Mongodb updating and setting a field in embedded document

Categories

Resources