ElasticSearch IO How to remove id from JSON document before writing

ElasticSearch IO How to remove id from JSON document before writing - java

I have an Apache Beam streaming job which reads data from Kafka and writes to ElasticSearch using ElasticSearchIO.
The issue I'm having is that messages in Kafka already have key field, and using ElasticSearchIO.Write.withIdFn() I'm mapping this field to document _id field in ElasticSearch.
Having a big volume of data I don't want the key field to be also written to ElasticSearch as part of _source.
Is there an option/workaround that would allow doing that?

Using the Ingest API and the remove processor you´ll be able to solve this pretty easy only using your elasticsearch cluster. You can also simulate ingest pipeline and the results.
I´ve prepared a example which will probably cover your case:
POST _ingest/pipeline/_simulate
{
"pipeline": {
"description": "remove id form incoming docs",
"processors": [
{"remove": {
"field": "id",
"ignore_failure": true
}}
]
},
"docs": [
{"_source":{"id":"123546", "other_field":"other value"}}
]
}
You see, there is one test document containing a filed "id". This field is not present in the response/result anymore:
{
"docs" : [
{
"doc" : {
"_index" : "_index",
"_type" : "_type",
"_id" : "_id",
"_source" : {
"other_field" : "other value"
},
"_ingest" : {
"timestamp" : "2018-12-03T16:33:33.885909Z"
}
}
}
]
}

I've created a ticket in Apache Beam JIRA describing this issue.
For now the original issue can not be resolved as part of indexation process using Apache Beam API.
The workaround that Etienne Chauchot, one of the maintainers, proposed is to
have separate task which will clear indexed data afterwords.
See Remove a field from a Elasticsearch document for example.
For the future, if someone also would like to leverage such feature, you might want to follow the linked ticket.

Related

Json to csv conversion in Spring boot

I have a csv structure like this
and I also have one json response
[
{
"ID" : "1",
"Name" : "abc",
"Mobile" : "123456"
},
{
"ID" : "2",
"Name" : "cde",
"Mobile" : "123345"
}
]
I need the output like this

If your intention is to convert directly the JSON then that baeldung solution that you were given is good.
Otherwise, the way i see it and based on the info you're giving, you will need to have a representation of that JSON in a java object that will either represent some kind of request coming from somewhere or data you're getting from your database in order to be written on a csv.
Check out these, might be useful:
https://www.codejava.net/frameworks/spring-boot/csv-export-example
https://zetcode.com/springboot/csv/

MongoDb Java Driver toJson() and $oid

I'm building a Java Jersey API which uses MongoDb and MongoDb driver.
The resources should output JSON of the stored MongoDb document to be used in the frontend project using Svelte.
Due to the standard org.bson.Document.toJson() implementation the output of my documents look somehow like:
[{ "_id" : { "$oid" : "5e97f08f2175aa9174dbec0e" }, "hour" : 8, "minute" : 15, "enabled" : true, "duration" : 120 }
I would rather like it to be:
[{ "_id" : "5e97f08f2175aa9174dbec0e", "hour" : 8, "minute" : 15, "enabled" : true, "duration" : 120 }
That way it's easier to handle the id in the frontend. So how to get rid of the $oid object?
I already managed to get the format as I wish by using:
JsonWriterSettings settings = JsonWriterSettings.builder()
.outputMode(JsonMode.RELAXED)
.objectIdConverter((value, writer) -> writer.writeString(value.toHexString()))
.build();
System.out.println(doc.toJson(settings));
But how to register this setting object globally so that every doc.toJson() call will use it?
And what will happen if I send modified or new documents from the frontend to the API and do:
Document document = Document.parse(doc);
Is my modified _id field automatically converted again to an ObjectId? Or do I need a org.bson.codecs.Decoder or CodecRegistry? How would this be done?

$oid refers to ObjectId field type in bson spec. As far as I know, you need to manipulate your document to replace ObjectId for your _id into String.
String oidAsString = document.getObjectId("_id").toString();
document.put("_id", oidAsString);

mongodb java driver pullByFilter

I have document schema such as
{
"_id" : 18,
"name" : "Verdell Sowinski",
"scores" : [
{
"type" : "exam",
"score" : 62.12870233109035
},
{
"type" : "quiz",
"score" : 84.74586220889356
},
{
"type" : "homework",
"score" : 81.58947824932574
},
{
"type" : "homework",
"score" : 69.09840625499065
}
]
}
I have a solution using pull that copes with removing a single element at a time but saw
I want to get a general solution that would cope with irregular schema where there would be between one and many elements to the array and I would like to remove all elements based on a condition.
I'm using mongodb driver 3.2.2 and saw this pullByFilter which sounded good
Creates an update that removes from an array all elements that match the given filter.
I tried this
Bson filter = and(eq("type", "homework"), lt("score", highest));
Bson u = Updates.pullByFilter(filter);
UpdateResult ur = collection.updateOne(studentDoc, u);
Unsurprisingly, this did not have any effect since I wasn't specifying the array scores
I get an error
The positional operator did not find the match needed from the query. Unexpanded update: scores.$.type
when I change the filter to be
Bson filter = and(eq("scores.$.type", "homework"), lt("scores.$.score", highest));
Is there a one step solution to this problem?
There seems very little info on this particular method I can find. This question may relate to How to Update Multiple Array Elements in mongodb

After some more "thinking" (and a little trial and error), I found the correct Filters method to wrap my basic filter. I think I was focusing on array operators too much.
I'll not post it here in case of flaming.
Clue: think "matches..." (as in regex pattern matching) when dealing with Filters helper methods ;)

How to use oneOf in JSON schema to validate both POST and PATCH requests?

Currently we are using a schema file that contains oneOf with 2 schemas: one for PATCH requests and one for POST requests. In Java code we check if id is available in the request then we check if there is any error message for the first schema in oneOf section.
Something like this:
processingReport.iterator().forEachRemaining(processingMessage -> {
JsonNode json = processingMessage.asJson();
JSONObject reports = new JSONObject(json.get("reports").toString());
logger.debug("Schema validation: {}", reports.toString());
//Seems always has 2 reports.
String reportIdentifier = isCreate ? "/properties/data/oneOf/0" : "/properties/data/oneOf/1";
JSONArray errorsArray = new JSONArray(reports.get(reportIdentifier).toString());
//Do something with the error here
});
But this seems not right to me. Is there any way to manage this in the schema itself so when id is available then it picks the right schema from the oneOf or perhaps there is better way to do it?
I know one option would be having different json files but our technical managers would rather to keep them in 1 place.

oneOf and anyOf clauses can be used to model conditional constraints. The following schema would validate on of patch or post schemas depending on the existence of id property:
{
"oneOf" : [{
"$ref" : "/post_request_schema#"
}, {
"allOf" : [{
"$ref" : "/patch_request_schema#"
}, {
"required" : ["id"]
}
]
}
]
}

REST api want to receive the dynamic data

I am a new guy to Java and to the REST API .I would like to know the way of receiving the dynamic data from client by REST api in java and process it.
For example,
Some times client will send the data like below,
{
"User" : "XXXX",
"Role" : "ZZZZ",
"Product" :
{
"Name" : "yyyy",
"Valid to" : "04/4/2025",
"Licensed version" : "jjjjj",
},
}
In the next contract, client may send like below,
{
"User" : "XXXX",
"Role" : "ZZZZ",
"Product" :
{
"Name" : "yyyy",
"Expiry Date" : "04/4/2025",
"Activation Date" : "jjjjj",
},
}
By referring the both examples, "Product" section having different data .For additional information may Client can send the additional data in this "Product" section. Would it be possible to make my REST Api to receive this type of dynamic data?.
If possible please let me know how can my REST api will able to receive this type of dynamic data and process it?.
Thanks

I do the same in php with the slim network and Doctrine 2. In Doctrine 2 i defined the entities with all the possible fields. In my javascript fronted, i collect the dynamic data and then deserialize it into the entityclasses. The classes can get saved into the database then or you can do whatever you need to do to process the data.
http://www.restapitutorial.com/ is a very good tutorial on how to design your REST API, i suggest reading it.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

ElasticSearch IO How to remove id from JSON document before writing - java

Related

Json to csv conversion in Spring boot

MongoDb Java Driver toJson() and $oid

mongodb java driver pullByFilter

How to use oneOf in JSON schema to validate both POST and PATCH requests?

REST api want to receive the dynamic data

Categories

Resources