Apache Spark to_json options parameter - java

I either don't know what I'm looking for or the documentation is lacking. The latter seems to be the case, given this:
http://spark.apache.org/docs/2.2.2/api/java/org/apache/spark/sql/functions.html#to_json-org.apache.spark.sql.Column-java.util.Map-
"options - options to control how the struct column is converted into a json string. accepts the same options and the json data source."
Great! So, what are my options?
I'm doing something like this:
Dataset<Row> formattedReader = reader
.withColumn("id", lit(id))
.withColumn("timestamp", lit(timestamp))
.withColumn("data", to_json(struct("record_count")));
...and I get this result:
{
"id": "ABC123",
"timestamp": "2018-11-16 20:40:26.108",
"data": "{\"record_count\": 989}"
}
I'd like this (remove back-slashes and quotes from "data"):
{
"id": "ABC123",
"timestamp": "2018-11-16 20:40:26.108",
"data": {"record_count": 989}
}
Is this one of the options by chance? Is there a better guide out there for Spark? The most frustrating part about Spark hasn't been getting it to do what I want, it's been a lack of good information on what it can do.

You are json encoding twice for the record_count field. Remove to_json. struct alone should be sufficient.
As in change your code to something like this.
Dataset<Row> formattedReader = reader
.withColumn("id", lit(id))
.withColumn("timestamp", lit(timestamp))
.withColumn("data", struct("record_count"));

Related

How to sort REST API response in the desired fashion?

{
"data": {
"uCPE PostStaging Completed": false,
"Order Submitted": true,
"uCPE PreStaging Completed": false,
"Poststaging device deploy success": true,
"uCPE Activated": false
},
"status": "SUCCESS"
}
Above mentioned is my Response Body. This keeps changing when a different input is provided. But I want the response only in the following Order. The KEY should be in the below order, however the values for each key will be changing.
“Order Submitted”:”true”,
"Poststaging device deploy success" : “true”,
"uCPE PreStaging Completed": “ true”,
“uCPE PostStaging Completed" : “false”,
“Order Completed”:”false”,
"uCPE Activated":"false"
If you are using Jackson, look into #JsonPropertyOrder.
In your case it would be something like this:
#JsonPropertyOrder({
"orderSubmitted",
"deploySuccess",
"preStagingCompleted",
"postStaingCompleted",
"orderCompleted",
"ucpeActivated"
})
As #Alan Sereb and #Roman Vottner suggested, #JsonPropertyOrder is best option if you are using your pojo class as response.
However, If you are passing Collection classes object directly to your parser and want to maintain the order then you need to use LinkedXXX or ArrayList for list interface. In your case as your are using Map<List>, you should use LinkedHashMap implementation.
This has very good explanations of Collection framework.

JSON pretty print customization

I'm writing a tool to modify huge json file in groovy. I read this file, add new entry and save, but I'would like to avoid changes in places I didn't touch.
I'm using new JsonBuilder( o ).toPrettyString() to get pretty formatted json output, but this function gives me result like this:
{
"key": "Foo",
"items": [
{
"Bar1": 1
},
{
"Bar2": 2
}
]
}
when I need to get this:
{
"key": "Foo",
"items":
[
{
"Bar1": 1
},
{
"Bar2": 2
}
]
}
There should be newline before [.
It's important to me, because in other way I cannot find in GIT history, what I really changed.
Do you have any idea how to achieve this?
The JsonBuilder method toPrettyString() delegates directly to JsonOutput.prettyPrint() as follows:
public String toPrettyString() {
return JsonOutput.prettyPrint(toString());
}
The latter method is not really customizable at all. However, the source is freely available from any Maven central repository or mirror. I would suggest finding the source and creating your own variant of the method that behaves the way you would like it to. The source for JsonOutput.prettyPrint() is only about 65 lines long and shouldn't be that hard to change.

Creating Java Object with reserved keywords as variable name

I have JSON that needs to be converted to a Java Object. The JSONs I need to handle can look like this:
{
"documents": [
{
"title": "Jobs",
"is-saved": false,
"abstract": "<span class=\"hit\">Jobs</span> may refer to:\n\n* Steve <span class=\"hit\">Jobs</span> (1955–2011), co-founder and former CEO of consumer electronics company...<br />",
"id": "Jobs",
"url": "http://en.wikipedia.org/wiki/Jobs"
}
],
"keywords_local": [
{
"keyword": "Jobs",
"interest": 1,
"angle": 0
}
],
"sessionid": "6cd6402e-1f67-45a8-b0fa-e79a5d0d50f4",
"q": "Jobs",
}
This JSON is returned when entering a search keyword on a searchengine, in this case "Jobs". I have not named these variables-to-be-created, this JSON was just "given" to me from a similar earlier app. So I'm obviously having trouble with variables is-saved and abstract. Abstract is a reserved keyword and everywhere I read a reserved keyword CANNOT be used as a variable name.
I do not have access to the previous app that I am sort of updating and I guess the point to that is that I need to figure it out by myself ;) But I am a bit of a stand still now, have no idea of how to move forward.
I'm a newbie, so do not give me hell if I'm asking a stupid question, it's my first time coding any sort of app! ;)
Thanks for any help!
If you use GSON for parsing you can name your members as you want and annotate them for mapping.
#SerializedName("abstract")
private String abstractText;
Another option you've got is to use Jackson, and use the #JsonProperty annotation..
#JsonProperty("abstract")
private String abstractText;
In fact, it depends on the tool you are using. With tools mapping directly to your custom POJO (like GSON, Jackson), you need to map your JSON field name with your Java correct and valid field name.
If you use a mors basic library such as JSON.org's, there is no need to do so because you parse it to specific object allowing you to handle it.
JSONObject obj = new JSONObject(" .... ");
JSONArray arr = obj.getJSONArray("documents");
String abstractValue = arr.getJSONObject(0).getString("abstract");

Scala/Play: parse JSON into Map instead of JsObject

On Play Framework's homepage they claim that "JSON is a first class citizen". I have yet to see the proof of that.
In my project I'm dealing with some pretty complex JSON structures. This is just a very simple example:
{
"key1": {
"subkey1": {
"k1": "value1"
"k2": [
"val1",
"val2"
"val3"
]
}
}
"key2": [
{
"j1": "v1",
"j2": "v2"
},
{
"j1": "x1",
"j2": "x2"
}
]
}
Now I understand that Play is using Jackson for parsing JSON. I use Jackson in my Java projects and I would do something simple like this:
ObjectMapper mapper = new ObjectMapper();
Map<String, Object> obj = mapper.readValue(jsonString, Map.class);
This would nicely parse my JSON into Map object which is what I want - Map of string and object pairs and would allow me easily to cast array to ArrayList.
The same example in Scala/Play would look like this:
val obj: JsValue = Json.parse(jsonString)
This instead gives me a proprietary JsObject type which is not really what I'm after.
My question is: can I parse JSON string in Scala/Play to Map instead of JsObject just as easily as I would do it in Java?
Side question: is there a reason why JsObject is used instead of Map in Scala/Play?
My stack: Play Framework 2.2.1 / Scala 2.10.3 / Java 8 64bit / Ubuntu 13.10 64bit
UPDATE: I can see that Travis' answer is upvoted, so I guess it makes sense to everybody, but I still fail to see how that can be applied to solve my problem. Say we have this example (jsonString):
[
{
"key1": "v1",
"key2": "v2"
},
{
"key1": "x1",
"key2": "x2"
}
]
Well, according to all the directions, I now should put in all that boilerplate that I otherwise don't understand the purpose of:
case class MyJson(key1: String, key2: String)
implicit val MyJsonReads = Json.reads[MyJson]
val result = Json.parse(jsonString).as[List[MyJson]]
Looks good to go, huh? But wait a minute, there comes another element into the array which totally ruins this approach:
[
{
"key1": "v1",
"key2": "v2"
},
{
"key1": "x1",
"key2": "x2"
},
{
"key1": "y1",
"key2": {
"subkey1": "subval1",
"subkey2": "subval2"
}
}
]
The third element no longer matches my defined case class - I'm at square one again. I am able to use such and much more complicated JSON structures in Java everyday, does Scala suggest that I should simplify my JSONs in order to fit it's "type safe" policy? Correct me if I'm wrong, but I though that language should serve the data, not the other way around?
UPDATE2: Solution is to use Jackson module for scala (example in my answer).
Scala in general discourages the use of downcasting, and Play Json is idiomatic in this respect. Downcasting is a problem because it makes it impossible for the compiler to help you track the possibility of invalid input or other errors. Once you've got a value of type Map[String, Any], you're on your own—the compiler is unable to help you keep track of what those Any values might be.
You have a couple of alternatives. The first is to use the path operators to navigate to a particular point in the tree where you know the type:
scala> val json = Json.parse(jsonString)
json: play.api.libs.json.JsValue = {"key1": ...
scala> val k1Value = (json \ "key1" \ "subkey1" \ "k1").validate[String]
k1Value: play.api.libs.json.JsResult[String] = JsSuccess(value1,)
This is similar to something like the following:
val json: Map[String, Any] = ???
val k1Value = json("key1")
.asInstanceOf[Map[String, Any]]("subkey1")
.asInstanceOf[Map[String, String]]("k1")
But the former approach has the advantage of failing in ways that are easier to reason about. Instead of a potentially difficult-to-interpret ClassCastException exception, we'd just get a nice JsError value.
Note that we can validate at a point higher in the tree if we know what kind of structure we expect:
scala> println((json \ "key2").validate[List[Map[String, String]]])
JsSuccess(List(Map(j1 -> v1, j2 -> v2), Map(j1 -> x1, j2 -> x2)),)
Both of these Play examples are built on the concept of type classes—and in particular on instances of the Read type class provided by Play. You can also provide your own type class instances for types that you've defined yourself. This would allow you to do something like the following:
val myObj = json.validate[MyObj].getOrElse(someDefaultValue)
val something = myObj.key1.subkey1.k2(2)
Or whatever. The Play documentation (linked above) provides a good introduction to how to go about this, and you can always ask follow-up questions here if you run into problems.
To address the update in your question, it's possible to change your model to accommodate the different possibilities for key2, and then define your own Reads instance:
case class MyJson(key1: String, key2: Either[String, Map[String, String]])
implicit val MyJsonReads: Reads[MyJson] = {
val key2Reads: Reads[Either[String, Map[String, String]]] =
(__ \ "key2").read[String].map(Left(_)) or
(__ \ "key2").read[Map[String, String]].map(Right(_))
((__ \ "key1").read[String] and key2Reads)(MyJson(_, _))
}
Which works like this:
scala> Json.parse(jsonString).as[List[MyJson]].foreach(println)
MyJson(v1,Left(v2))
MyJson(x1,Left(x2))
MyJson(y1,Right(Map(subkey1 -> subval1, subkey2 -> subval2)))
Yes, this is a little more verbose, but it's up-front verbosity that you pay for once (and that provides you with some nice guarantees), instead of a bunch of casts that can result in confusing runtime errors.
It's not for everyone, and it may not be to your taste—that's perfectly fine. You can use the path operators to handle cases like this, or even plain old Jackson. I'd encourage you to give the type class approach a chance, though—there's a steep-ish learning curve, but lots of people (including myself) very strongly prefer it.
I've chosen to use Jackson module for scala.
import com.fasterxml.jackson.databind.ObjectMapper
import com.fasterxml.jackson.module.scala.DefaultScalaModule
import com.fasterxml.jackson.module.scala.experimental.ScalaObjectMapper
val mapper = new ObjectMapper() with ScalaObjectMapper
mapper.registerModule(DefaultScalaModule)
val obj = mapper.readValue[Map[String, Object]](jsonString)
For further reference and in the spirit of simplicity, you can always go for:
Json.parse(jsonString).as[Map[String, JsValue]]
However, this will throw an exception for JSON strings not corresponding to the format (but I assume that goes for the Jackson approach as well). The JsValue can now be processed further like:
jsValueWhichBetterBeAList.as[List[JsValue]]
I hope the difference between handling Objects and JsValues is not an issue for you (only because you were complaining about JsValues being proprietary). Obviously, this is a bit like dynamic programming in a typed language, which usually isn't the way to go (Travis' answer is usually the way to go), but sometimes that's nice to have I guess.
You can simply extract out the value of a Json and scala gives you the corresponding map.
Example:
var myJson = Json.obj(
"customerId" -> "xyz",
"addressId" -> "xyz",
"firstName" -> "xyz",
"lastName" -> "xyz",
"address" -> "xyz"
)
Suppose you have the Json of above type.
To convert it into map simply do:
var mapFromJson = myJson.value
This gives you a map of type : scala.collection.immutable.HashMap$HashTrieMap
Would recommend reading up on pattern matching and recursive ADTs in general to better understand of why Play Json treats JSON as a "first class citizen".
That being said, many Java-first APIs (like Google Java libraries) expect JSON deserialized as Map[String, Object]. While you can very simply create your own function that recursively generates this object with pattern matching, the simplest solution would probably be to use the following existing pattern:
import com.google.gson.Gson
import java.util.{Map => JMap, LinkedHashMap}
val gson = new Gson()
def decode(encoded: String): JMap[String, Object] =
gson.fromJson(encoded, (new LinkedHashMap[String, Object]()).getClass)
The LinkedHashMap is used if you would like to maintain key ordering at the time of deserialization (a HashMap can be used if ordering doesn't matter). Full example here.

Decipher JSON response googles topic api

I am using goggle's search api to get topics id which is used to get JSON response from topic api.The returned response looks like this
{
"id":"/m/01d5g",
"property":{
"/amusement_parks/ride_theme/rides":{...},
"/award/ranked_item/appears_in_ranked_lists":{...},
"/book/book_character/appears_in_book":{
"valuetype":"object",
"values":[
{
"text":"Inferno",
"lang":"en",
"id":"/m/0g5qs3",
"creator":"/user/duck1123",
"timestamp":"2010-02-11T04:00:59.000Z"
},
{
"text":"Batman: Year One",
"lang":"en",
"id":"/m/0hzz_1h",
"creator":"/user/anasay",
"timestamp":"2012-01-25T11:05:03.000Z"
},
{
"text":"Batman: The Dark Knight Returns",
"lang":"en",
"id":"/m/0hzz_sb",
"creator":"/user/anasay",
"timestamp":"2012-01-25T11:22:17.001Z"
},
{
"text":"Batman: Son of the Demon",
"lang":"en",
"id":"/m/071l77",
"creator":"/user/wikimapper",
"timestamp":"2013-07-11T15:20:32.000Z"
},
{
"text":"Joker",
"lang":"en",
"id":"/m/04zxvhs",
"creator":"/user/wikimapper",
"timestamp":"2013-07-11T16:58:37.000Z"
},
{
"text":"Arkham Asylum: A Serious House on Serious Earth",
"lang":"en",
"id":"/m/0b7hyw",
"creator":"/user/wikimapper",
"timestamp":"2013-07-11T19:26:54.000Z"
}
],
"count":6.0
},
"/book/book_subject/works":{...},
"/comic_books/comic_book_character/cover_appearances":{...},
...
}
}
I want to decipher this so that i can get relevant information such as, "/book/book_character/appears_in_book" itself is a property for response and only required value that i want from it is "text" and "id" e.g. "text":"Batman: Year One" and "id":"/m/0hzz_1h".
Since the response does not have fixed properties, and which may varying according to response id. how can i covert this JSON response in java Class where i can store "/book/book_character/appears_in_book" as one serialized class and containing Collection of values such has id and text and appears_in_book as name variables for class.
I considered GSON to do this. since name of property is not constant i can not use it to covert JSON to Java Object. currently i am iterating over each property by hard coding and filling them in java variables.
If some one can provide efficient way to do so i will appreciate help.
You could do this dynamically using reflection in Java but this is an advanced feature of Java and it may make your code more complicated than it needs to be.
See: Dynamically create an object in java from a class name and set class fields by using a List with data
A simpler alternative would be to just parse the JSON into a bunch of nested Maps and Lists exactly as they're given in the JSON data.
See: How to parse JSON in Java

Categories