I have the following Yaml file that I am trying to update, depending on whether a value for a particular key exits.
If productName with a value of test exists in the Yaml file, I want to update its respective URL productUrl with a new value.
If I have a new productName called test that does not exist in the Yaml file, I want to be able to add a new entry to the Yaml file for this productName and its productUrl.
products:
- productName: abc
productUrl: https://company/product-abc
- productName: def
productUrl: https://company/product-def
- productName: ghi
productUrl: https://company/product-ghi
- productName: jkl
productUrl: https://company/product-jkl
- productName: mno
productUrl: https://company/product-mno
- productName: pqr
productUrl: https://company/product-pqr
This is what I have so far but I'm not sure if this can be re-written in a much cleaner way, or if there's a bug in my approach.
#Grab('org.yaml:snakeyaml:1.17')
import org.yaml.snakeyaml.Yaml
Yaml parser = new Yaml()
def p = parser.load(("company.yml" as File).text)
Boolean isProductNew = true
p.company.products.each { i ->
if (i.productName == 'test') {
i.productUrl = 'https://company/product-new-test'
isProductNew = false
}
}
if (isProductNew) {
p.company.products << ["productName": "test", "productUrl": "https://company/product-test"]
}
println p
You can put the code in a cleaner way:
def prod = p.company.products.find{ it.productName == 'test' }
if( !prod ){
prod = [productName: "test"]
p.company.products << prod
}
product.productUrl = "https://company/product-test"
Related
If you can help me, I have an update in mongo with $cond , this update is only done if the field is empty, otherwise it updates the field with another value. Example in mongo db
I want to update the field camp1
if camp1 no exits = values
if camp1 exits = value2
db.getCollection('prueba').update(
{"cdAccount": "ES3100810348150001326934"},
[{$set:{camp1 :{"$cond": [{"$not": ["$camp1"]}, "values", "value2"]}}}]);
Result:
{
"_id" : ObjectId("62dd08c3f9869303b79b323b"),
"cdAccount" : "ES3100810348150001326934",
"camp1" : "value2"
}
Now I do the same in scala with this code
def appendIfNotNull(key: String,value :Object) = {
var eq2Array = new util.ArrayList[Object]()
eq2Array.add("$"+key)
val eq2Op = new Document("$not", eq2Array)
var condList = new util.ArrayList[Object]()
condList.add(eq2Op)
condList.add(value.asInstanceOf[AnyRef])
//condList.add("$"+key)
condList.add("value2")
val availDoc =
new Document("$cond",
new Document("$cond", condList)).toBsonDocument(classOf[BsonDocument],getCodecRegistry).get("$cond")
println("availDoc : " + availDoc)
documentGrab.append(key,availDoc)
}
val finalVar = appendIfNotNull("camp1","values")
println("finalVar : " + finalVar)
availDoc : {"$cond": [{"$not": ["$camp1"]}, "values", "value2"]}
finalVar : Document{{camp1={"$cond": [{"$not": ["$camp1"]}, "values", "value2"]}}}
val updateDocument = new Document("$set" , finalVar )
println("updateDocument : " + updateDocument)
collectionA.updateMany(Filters.eq("cdAccount", "ES3100810348150001326934"),updateDocument)
The only difference I see is that in mongodb the "[" is added at the beginning of the $set and it does it well
MongoDB
[ {$set:{camp1 :{"$cond": [{"$not": ["$camp1"]}, "values", "value2"]}}} ] --> Ok Update
Scale
{$set:{camp1 :{"$cond": [{"$not": ["$camp1"]}, "values", "value2"]}}} --> Ok in scala , but I get the result II
I am using mongodb 5.0.9
Now in mongodb I execute the statement made in scala
db.getCollection('prueba').update(
{"cdAccount": "ES3100810348150001326934"},
{$set :{camp1 :{"$cond": [{"$not": ["$camp1"]}, "values", "value2"]}}});
When I run it in scala the same thing happens
Result II :
{
"cdAccount" : "ES3100810348150001326934",
"camp1" : {
"$cond" : [
{
"$not" : [
"$camp1"
]
},
"values",
"value2"
]
}
}
Can someone tell me how to fix it?
Thank you so much
You see the very important difference when priting the queries.
$cond is an aggregation pipeline operator. It is processed only when aggregation pipeline is used to update the data. When a simple (non-pipelined) update is used, the operator has no special meaning and this is exactly what you see in the output.
You indicate "pipeline update" by passing an array instead of simple object as update description in javascript API (and mongo console). In Scala/Java you have to use one of the updateMany overloads that takes update description as List, not Bson. I.e. you need something like
collectionA.updateMany(
Filters.eq("cdAccount", "ES3100810348150001326934"),
Collections.singletonList(updateDocument)
)
Below script is working fine and its getting list of culprits
def PostFailure()
{
emailext body: "your email body here",
mimeType: 'text/html',
subject: "your subject here",
to: emailextrecipients([
[$class: 'CulpritsRecipientProvider']
])
}
I have formatted the body section of email as mentioned in below code and class CulpritsRecipientProvider is not working.
def PostFailure()
{
def x='1'
def config = [:]
def subject = config.subject ? config.subject : "EPBCS ${env.JOB_NAME} - Release Number:${env.ReleaseNumber} Build #${env.BuildNumber} - ${currentBuild.result}!"
def content = '${SCRIPT,template="groovy-html-ps.template"}'
def attachLog = (config.attachLog != null) ? config.attachLog : (currentBuild.result != "SUCCESS") // Attach buildlog when the build is not successfull
to: emailextrecipients([
[$class: 'RequesterRecipientProvider']
])
}
Please help me to fix the code which is not working.
The below script worked for me.
def postFailure()
{
def content = '${SCRIPT,template="groovy-html.template"}'
emailext body: "${content}",
mimeType: 'text/html',
subject: "this is subject",
to: emailextrecipients([
[$class: 'CulpritsRecipientProvider']
])
}
I have a dataset like this:
uid group_a group_b
1 3 unkown
1 unkown 4
2 unkown 3
2 2 unkown
I want to get the result:
uid group_a group_b
1 3 4
2 2 3
I try to group the data by "uid" and iterate each group and select the not-unkown value as the final value, but don't know how to do it.
I would suggest you define a User Defined Aggregation Function (UDAF)
Using inbuilt functions are great ways but they are difficult to be customized. If you own a UDAF then it is customizable and you can edit it according to your needs.
Concerning your problem, following can be your solution. You can edit it according to your needs.
First task is to define a UDAF
class PingJiang extends UserDefinedAggregateFunction {
def inputSchema = new StructType().add("group_a", StringType).add("group_b", StringType)
def bufferSchema = new StructType().add("buff0", StringType).add("buff1", StringType)
def dataType = StringType
def deterministic = true
def initialize(buffer: MutableAggregationBuffer) = {
buffer.update(0, "")
buffer.update(1, "")
}
def update(buffer: MutableAggregationBuffer, input: Row) = {
if (!input.isNullAt(0)) {
val buff = buffer.getString(0)
val groupa = input.getString(0)
val groupb = input.getString(1)
if(!groupa.equalsIgnoreCase("unknown")){
buffer.update(0, groupa)
}
if(!groupb.equalsIgnoreCase("unknown")){
buffer.update(1, groupb)
}
}
}
def merge(buffer1: MutableAggregationBuffer, buffer2: Row) = {
val buff1 = buffer1.getString(0)+buffer2.getString(0)
val buff2 = buffer1.getString(1)+buffer2.getString(1)
buffer1.update(0, buff1+","+buff2)
}
def evaluate(buffer: Row) : String = {
buffer.getString(0)
}
}
Then you call it from your main class and do some manipulations to get the result you need as
val data = Seq(
(1, "3", "unknown"),
(1, "unknown", "4"),
(2, "unknown", "3"),
(2, "2", "unknown"))
.toDF("uid", "group_a", "group_b")
val udaf = new PingJiang()
val result = data.groupBy("uid").agg(udaf($"group_a", $"group_b").as("ping"))
.withColumn("group_a", split($"ping", ",")(0))
.withColumn("group_b", split($"ping", ",")(1))
.drop("ping")
result.show(false)
Visit databricks and augmentiq for better understanding of UDAF
Note : The above solution gets you the latest value for each group if present (You can always edit according to your needs)
After you format the dataset to a PairRDD you can use the reduceByKey operation to find the single known value. The following example assumes that there is only one known value per uid or otherwise returns the first known value
val input = List(
("1", "3", "unknown"),
("1", "unknown", "4"),
("2", "unknown", "3"),
("2", "2", "unknown")
)
val pairRdd = sc.parallelize(input).map(l => (l._1, (l._2, l._3)))
val result = pairRdd.reduceByKey { (a, b) =>
val groupA = if (a._1 != "unknown") a._1 else b._1
val groupB = if (a._2 != "unknown") a._2 else b._2
(groupA, groupB)
}
The result will be a pairRdd that looks like this
(uid, (group_a, group_b))
(1,(3,4))
(2,(2,3))
You can return to the plain line format with a simple map operation.
You could replace all "unknown" values by null, and then use the function first() inside a map (as shown here), to get the first non-null values in each column per group:
import org.apache.spark.sql.functions.{col,first,when}
// We are only gonna apply our function to the last 2 columns
val cols = df.columns.drop(1)
// Create expression
val exprs = cols.map(first(_,true))
// Putting it all together
df.select(df.columns
.map(c => when(col(c) === "unknown", null)
.otherwise(col(c)).as(c)): _*)
.groupBy("uid")
.agg(exprs.head, exprs.tail: _*).show()
+---+--------------------+--------------------+
|uid|first(group_1, true)|first(group_b, true)|
+---+--------------------+--------------------+
| 1| 3| 4|
| 2| 2| 3|
+---+--------------------+--------------------+
Data:
val df = sc.parallelize(Array(("1","3","unknown"),("1","unknown","4"),
("2","unknown","3"),("2","2","unknown"))).toDF("uid","group_1","group_b")
I'm a beginner of Scala and using lib "json4s" for JSON parsing, and I have JSON data formatted like below:
scala> val str = """
| {
| "index_key": {
| "time":"12938473",
| "event_detail": {
| "event_name":"click",
| "location":"US"
| }
| }
| }
| """
I'm trying to get "index_key" and sign it to a variable. I tried below:
scala> val json = parse(str)
json: org.json4s.JValue = JObject(List((index_key,JObject(List((time,JString(12938473)), (event_detail,JObject(List((event_name,JString(click)), (location,JString(US))))))))))
scala> json.values
res40: json.Values = Map(index_key -> Map(time -> 12938473, event_detail -> Map(event_name -> click, location -> US)))
and I can get the Map from "json.values" by "json.values.head" or "json.values.keys". But I cannot get the first key "index_key" from this map. Could anyone please tell me how to get map key value "index_key"? and what "res40: json.Values" has to do with Map type? Thanks a lot.
I'm not familiar with json4s specifically but I'm pretty sure it acts like most other json libraries in that it provides you with a nice DSL for extracting out data from parsed json.
I had a look at the docs and found this:
scala> val json =
("person" ->
("name" -> "Joe") ~
("age" -> 35) ~
("spouse" ->
("person" ->
("name" -> "Marilyn") ~
("age" -> 33)
)
)
)
scala> json \\ "spouse"
res0: org.json4s.JsonAST.JValue = JObject(List(
(person,JObject(List((name,JString(Marilyn)), (age,JInt(33)))))))
The \\ operator traverses the JSON structure and extracts the data at that node. Note that the double slash operator in this case works recursively, to reach the root node you would use a single slash, i.e '\'.
For your example it would be json \ "index_key" which would return the JSON at that node.
head node value can be retrieved like below, thanks to answer from #bjfletcher
parse(str).asInstanceOf[JObject].values.head._1
For testing purpose i need to override 'equals' method:
def any = [equals: { true }] as String
any == 'should be true'
// false
More detailed about problem:
class EmployeeEndpointSpec extends RestSpecification {
void "test employee" () {
when:
get "/v1/employee", parameters
then:
expectedStatus.equals(response.statusCode)
expectedJson.equals(response.json)
where:
parameters << [
[:],
[id: 824633720833, style: "small"]
]
expectedStatus << [
HttpStatus.BAD_REQUEST,
HttpStatus.OK
]
expectedJson << [
[errorCode: "badRequest"],
[
id: 824633720833,
name: "Jimmy",
email: "jimmy#fakemail.com",
dateCreated:"2015-01-01T01:01:00.000", // this value should be ignored
lastUpdated: "2015-01-01T01:01:00.000" // and this
]
]
}
}
lastUpdated and dateCreated may change in time, and i need
somehow ignore them.
If there's no need to compare mentioned fields - remove them:
class EmployeeEndpointSpec extends RestSpecification {
void "test employee" () {
when:
get "/v1/employee", parameters
then:
expectedStatus.equals(response.statusCode)
def json = response.json
json.remove('dateCreated')
json.remove('lastUpdated')
expectedJson.equals(response.json)
where:
parameters << [
[:],
[id: 824633720833, style: "small"]
]
expectedStatus << [
HttpStatus.BAD_REQUEST,
HttpStatus.OK
]
expectedJson << [
[errorCode: "badRequest"],
[
id: 824633720833,
name: "Jimmy",
email: "jimmy#fakemail.com",
dateCreated:"2015-01-01T01:01:00.000",
lastUpdated: "2015-01-01T01:01:00.000"
]
]
}
}
I'd also separate testing negative and positive scenarios.
You can also test keySet() separately from testing keys values instead of comparing the whole map. This is the way I'd do that:
then:
def json = response.json
json.id == 824633720833
json.name == "Jimmy"
json.email == "jimmy#fakemail.com"
json.dateCreated.matches('<PATTERN>')
json.lastUpdated.matches('<PATTERN>')
In case You don't like the last two lines it can be replaced with:
json.keySet().contains('lastUpdated', 'dateCreated')
The answer is:
String.metaClass.equals = { Object _ -> true }