How to prevent ObjectMapper from converting escaped unicode? - java

I'm using Jackson 2.4 in Java to do some JSON legwork. I make a call to a remote server with Apache HttpGet, deserialize the results with Jackson into a POJO, manipulate those results, and then serialize them with Jackson to push back to a remote server with HttpPost.
The issue I'm finding is that Jackson is translating unicode literals into unicode characters, which I need it not to do thanks to encoding issues on each end. For example, I might have this in the JSON:
"field1": "\u00a2"
But Jackson is converting the "\u00a2" to "ยข" when it's deserialized, which causes problems with the remote server. It has to be maintained as escaped unicode. If I use something like Apache EntityUtils (specifying UTF-8) or even make the call from my web browser to get the data, the escaped unicode is preserved, so I know that it's coming in properly from the server. If I have Jackson consume the input stream from the entity on the response, it does the conversion automatically.
I've tried writing with a JsonGenerator that is explicitly set to UTF-8 to write to the HttpPost. It didn't work, remote server still rejected it. I've dug through the configuration options for ObjectMapper and JsonParser, but I don't see anything that would override this behavior. Escaping non-ASCII, sure, but that's not what I need to do here. Maybe I'm missing something obvious, but I can't get Jackson to deserialize this string without replacing the escaped unicode.
EDIT: Well, my bad, the only literals having problems have 3 or 5 leading slashes, not just one. That's some screwiness, but Java seems to be what's unpacking it by default during the deserialization, even if the raw text that came back from the server preserves it. Still not sure how to get Java to preserve this without checking an insane amount of text.

What you are expecting is outside scope of Jackosn. It's java that converts the String while reading it. For same reason, if you have a properties file with value \u00a2 and read it using jdk API, you will get converted value. Depending on the file size, either you can double escape char \ before passing the string to Json or "escape" the string back using your Deserializer (only for string) and something like below:
Thank you
package com.test.json;
import com.fasterxml.jackson.core.JsonParser;
import com.fasterxml.jackson.core.JsonProcessingException;
import com.fasterxml.jackson.core.type.TypeReference;
import com.fasterxml.jackson.databind.DeserializationContext;
import com.fasterxml.jackson.databind.JsonDeserializer;
import com.fasterxml.jackson.databind.ObjectMapper;
import com.fasterxml.jackson.databind.module.SimpleModule;
import java.io.IOException;
import java.util.Map;
public class Jackson {
static ObjectMapper _MAPPER = new ObjectMapper();
public static void main(String[] args) throws Exception {
String json = "{\"field1\": \"\\u00a2\",\"field2\": \"\\u00a2 this\",\"numberField\": 121212}";
SimpleModule testModule
= new SimpleModule("StOvFl", _MAPPER.version()).addDeserializer(String.class,
new UnEscapedSerializaer());
_MAPPER.registerModule(testModule);
Map m = _MAPPER.readValue(json, new TypeReference<Map<String, Object>>() {
});
System.out.println("m" + m);
}
}
class UnEscapedSerializaer extends JsonDeserializer<String> {
#Override
public String deserialize(JsonParser jp, DeserializationContext ctxt)
throws IOException, JsonProcessingException {
String s = jp.getValueAsString();
return org.apache.commons.lang.StringEscapeUtils.StringEscapeUtils.escapeJava(s);
}
}

Another way to custom Jackson's behavior is customized JsonParser. See jackson's source code of JsonFactory, ReaderBasedJsonParser;
The key methond is _finishString2() which is used to do 'decodeEscaped', so we can write a JsonParser extends ReaderBasedJsonParser and override the _finishString2 method:
public class MyJsonParser extends ReaderBasedJsonParser {
#Override
protected void _finishString2() throws IOException {
char[] outBuf = _textBuffer.getCurrentSegment();
int outPtr = _textBuffer.getCurrentSegmentSize();
final int[] codes = _icLatin1;
final int maxCode = codes.length;
while (true) {
if (_inputPtr >= _inputEnd) {
if (!loadMore()) {
_reportInvalidEOF(": was expecting closing quote for a string value");
}
}
char c = _inputBuffer[_inputPtr++];
int i = (int) c;
if (i < maxCode && codes[i] != 0) {
if (i == INT_QUOTE) {
break;
} else {
//c = _decodeEscaped();
//do nth
}
}
// Need more room?
if (outPtr >= outBuf.length) {
outBuf = _textBuffer.finishCurrentSegment();
outPtr = 0;
}
// Ok, let's add char to output:
outBuf[outPtr++] = c;
}
_textBuffer.setCurrentLength(outPtr);
}
public static void main(String[] args) throws IOException {
String json = "{\"field1\": \"\\u00a2\",\"field2\": \"\\u00a2 this\",\"numberField\": 121212}";
ObjectMapper objectMapper = new ObjectMapper(new MyJsonParserFactory());
Object o = objectMapper.readValue(json, Object.class);
System.out.println(o);
}
}
Full demo code here

Related

How do I set the coder for a PCollection<List<String>> in Apache Beam?

I'm teaching myself Apache Beam, specifically for using in parsing JSON. I was able to create a simple example that parsed JSON to a POJO and POJO to CSV. It required that I use .setCoder()
for my simple POJO class.
pipeline
.apply("Read source JSON file.", TextIO.read().from(options.getInput()))
.apply("Parse to POJO matching schema", ParseJsons.of(Person.class))
.setCoder(SerializableCoder.of(Person.class))
.apply("Create comma delimited string", new PersonToCsvRow())
.apply("Write out to file", TextIO.write().to(options.getOutput())
.withoutSharding());
The problem
Now I am trying to skip the POJO step of parsing using some custom transforms. My pipeline looks like this:
pipeline
.apply("Read Json", TextIO.read().from("src/main/resources/family_tree.json"))
.apply("Traverse Json tree", new JSONTreeToPaths())
.apply("Format tree paths", new PathsToCSV())
.apply("Write to CSV", TextIO.write().to("src/main/resources/paths.csv")
.withoutSharding());
This pipeline is supposed to take a heavily nested JSON structure and print each individual path through the tree. I'm getting the same error I did in the POJO example above:
Exception in thread "main" java.lang.IllegalStateException: Unable to return a default Coder for Traverse Json tree/MapElements/Map/ParMultiDo(Anonymous).output [PCollection#331122245]. Correct one of the following root causes:
No Coder has been manually specified; you may do so using .setCoder().
What I tried
So I tried to add a coder in a few different ways:
.setCoder(SerializableCoder.of(List<String>.class))
Results in "Cannot select from parameterized type". I found another instance of this error generated by a different use case here, but the accepted answer seemed only be applicable to that use case.
So then I started perusing the Beam docs and found ListCoder.of() which has (literally) no description. But it looked promising, so I tried it:
.setCoder(ListCoder.of(SerializableCoder.of(String.class)))
But this takes me back to the initial error of not having manually set a coder.
The question
How do I satisfy this requirement to set a coder for a List<String> object?
Code
The transform that is causing the setCoder error is this one:
package transforms;
import com.fasterxml.jackson.core.JsonProcessingException;
import com.fasterxml.jackson.databind.JsonNode;
import com.fasterxml.jackson.databind.ObjectMapper;
import org.apache.beam.sdk.transforms.MapElements;
import org.apache.beam.sdk.transforms.PTransform;
import org.apache.beam.sdk.transforms.SimpleFunction;
import org.apache.beam.sdk.values.PCollection;
import java.util.ArrayList;
import java.util.List;
public class JSONTreeToPaths extends PTransform<PCollection<String>, PCollection<List<String>>> {
public static class ExtractPathsFromTree extends SimpleFunction<JsonNode, List<String>> {
public List<String> apply(JsonNode root) {
List<String> pathContainer = new ArrayList<>();
getPaths(root, "", pathContainer);
return pathContainer;
}
}
public static class GetRootNode extends SimpleFunction<String, JsonNode> {
public JsonNode apply(String jsonString) {
try {
return getRoot(jsonString);
} catch (JsonProcessingException e) {
e.printStackTrace();
return null;
}
}
}
#Override
public PCollection<List<String>> expand(PCollection<String> input) {
return input
.apply(MapElements.via(new GetRootNode()))
.apply(MapElements.via(new ExtractPathsFromTree()));
}
private static JsonNode getRoot(String jsonString) throws JsonProcessingException {
ObjectMapper mapper = new ObjectMapper();
return mapper.readTree(jsonString);
}
private static void getPaths(JsonNode node, String currentPath, List<String> paths) {
//check if leaf:
if (node.path("children").isMissingNode()) {
currentPath += node.get("Id");
paths.add(currentPath);
System.out.println(currentPath);
return;
}
// recursively iterate over children
currentPath += (node.get("Id") + ",");
for (JsonNode child : node.get("children")) {
getPaths(child, currentPath, paths);
}
}
}
While the error message seems to imply that the list of strings is what needs encoding, it is actually the JsonNode. I just had to read a little further down in the error message, as the opening statement is a bit deceiving as to where the issue is:
Exception in thread "main" java.lang.IllegalStateException: Unable to return a default Coder for Traverse Json tree/MapElements/Map/ParMultiDo(Anonymous).output [PCollection#1324829744].
...
...
Inferring a Coder from the CoderRegistry failed: Unable to provide a Coder
for com.fasterxml.jackson.databind.JsonNode.
Building a Coder using a registered CoderProvider failed.
Once I discovered this, I solved the problem by extending Beam's CustomCoder class. This abstract class is nice because you only have to write the code to serialize and deserialize the object:
public class JsonNodeCoder extends CustomCoder<JsonNode> {
#Override
public void encode(JsonNode node, OutputStream outStream) throws IOException {
ObjectMapper mapper = new ObjectMapper();
String nodeString = mapper.writeValueAsString(node);
outStream.write(nodeString.getBytes());
}
#Override
public JsonNode decode(InputStream inStream) throws IOException {
byte[] bytes = IOUtils.toByteArray(inStream);
ObjectMapper mapper = new ObjectMapper();
String json = new String(bytes);
return mapper.readTree(json);
}
}
Hopes this helps some other Beam newbie out there.

java parse tweet corpus json

I have a problem: I need to parse a JSON file in Java where each line represents a tweet and follows the standard JSON of Twitter. I do not need all the information, I attach two photos to show you which fields I need. I would do it without using any support library. Thank you!
This is what I did for now. I do not think it's the best way to do it, especially going ahead I'll be in trouble because the names of many fields repeat
import java.io.BufferedReader;
import java.io.File;
import java.io.IOException;
import java.nio.file.Files;
import java.util.ArrayList;
import java.util.Iterator;
import java.util.List;
public class TweetCorpus implements Iterable<Tweet>
{
private List<Tweet> tweets;
public static TweetCorpus parseFile(File file)
{
List<Tweet> tweets = new ArrayList<>();
try(BufferedReader br = Files.newBufferedReader(file.toPath()))
{
while(br.ready())
{
String tweet = br.readLine();
//System.out.println(tweet);
if(!tweet.isEmpty())
{
long l = Long.parseLong(tweet.substring(tweet.indexOf("\"id\":") + 5, tweet.indexOf(",\"id_str\":")));
String t = tweet.substring(tweet.indexOf(",\"text\":\"") + 9, tweet.indexOf(",\"source\":"));
tweets.add(new Tweet(l, t));
}
}
}
catch(IOException e)
{
e.printStackTrace();
}
return new TweetCorpus(tweets);
}
public int getTweetCount() { return tweets.size(); }
public TweetCorpus(List<Tweet> tweets)
{
this.tweets = tweets;
}
#Override
public Iterator<Tweet> iterator()
{
return tweets.iterator();
}
public static void main(String[] args)
{
TweetCorpus t = parseFile(new File("C:\\Users\\acer\\Desktop\\Moroder\\Uni\\1 Anno - 2 Semestre\\Metodologie Di Programmazione\\Progetto\\HM4Test\\tweetsCorpus.js"));
t.getTweetCount();
}
}
json media/retweet tweet
json "normal" tweet
You can use Gson or Jackson java library to parse json to Tweet object. Their are tools online which generates pojo from json, which you can use with jackson to parse your json string to object.
Once you have json values in an object, you can use getters/setters to extract/modify the values you are interested in from input json.
Well writing your own parser would be a reinventing the wheel kind of task. But if your need is to write your own parser, refer to jackson project on github for inspiration on design and maintenance.
This would help you in making a generic application.
Quick reference for jackson parser ,
https://dzone.com/articles/processing-json-with-jackson
Re-inventing a JSON parser using only readLine() is a really bad idea. If you don't have experience writing parsers by hand, you will end up with a lot of bad code that is really hard to understand. Just use a library. There are tons of good JSON libraries for Java.
Jackson
GSON
Boon
Example code:
static class User {
String id, name;
}
static class MyTweet {
String id, text;
User user;
}
// if the entire file is a JSON array:
void parse(Reader r) {
List<MyTweet> tweets = objectMapper.readValue(
r, new TypeReference<List<MyTweet>>(){});
}
// if each line is a single JSON object:
void parse(BufferedReader r) {
while (r.ready()) {
String line = r.readLine();
MyTweet tweet = objectMapper.readValue(line, MyTweet.class);
}
}

json formatting with moshi

Does anyone know a way to get moshi to produce a multi-line json with indentation ( for human consumption in the context of a config.json )
so from:
{"max_additional_random_time_between_checks":180,"min_time_between_checks":60}
to something like this:
{
"max_additional_random_time_between_checks":180,
"min_time_between_checks":60
}
I know other json-writer implementations can do so - but I would like to stick to moshi here for consistency
Now you can use .indent(" ") method on adapter for formatting.
final Moshi moshi = new Moshi.Builder().build();
String json = moshi.adapter(Dude.class).indent(" ").toJson(new Dude())
If you can deal with serializing the Object yourself, this should do the trick:
import com.squareup.moshi.JsonWriter;
import com.squareup.moshi.Moshi;
import java.io.IOException;
import okio.Buffer;
public class MoshiPrettyPrintingTest {
private static class Dude {
public final String firstName = "Jeff";
public final String lastName = "Lebowski";
}
public static void main(String[] args) throws IOException {
final Moshi moshi = new Moshi.Builder().build();
final Buffer buffer = new Buffer();
final JsonWriter jsonWriter = JsonWriter.of(buffer);
// This is the important part:
// - by default this is `null`, resulting in no pretty printing
// - setting it to some value, will indent each level with this String
// NOTE: You should probably only use whitespace here...
jsonWriter.setIndent(" ");
moshi.adapter(Dude.class).toJson(jsonWriter, new Dude());
final String json = buffer.readUtf8();
System.out.println(json);
}
}
This prints:
{
"firstName": "Jeff",
"lastName": "Lebowski"
}
See prettyPrintObject() in this test file and the source code of BufferedSinkJsonWriter.
However, I haven't yet figured out whether and how it is possible to do this if you're using Moshi with Retrofit.

Genson Polymorphic / Generic Serialization

I am trying to implement a JSON serialization in Java with Genson 1.3 for polymorphic types, including:
Numbers
Arrays
Enum classes
The SSCCE below demonstrates roughly what I am trying to achieve:
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
import com.owlike.genson.Genson;
import com.owlike.genson.GensonBuilder;
/**
* A Short, Self Contained, Compilable, Example for polymorphic serialization
* and deserialization.
*/
public class GensonPolymoprhicRoundTrip {
// our example enum
public static enum RainState {
NO_RAIN,
LIGHT_RAIN,
MODERATE_RAIN,
HEAVY_RAIN,
LIGHT_SNOW,
MODERATE_SNOW,
HEAVY_SNOW;
}
public static class Measurement<T> {
public T value;
public int qualityValue;
public String source;
public Measurement() {
}
public Measurement(T value, int qualityValue, String source) {
this.value = value;
this.qualityValue = qualityValue;
this.source = source;
}
}
public static class DTO {
public List<Measurement<?>> measurements;
public DTO(List<Measurement<?>> measurements) {
this.measurements = measurements;
}
}
public static void main(String... args) {
Genson genson = new GensonBuilder()
.useIndentation(true)
.useRuntimeType(true)
.useClassMetadataWithStaticType(false)
.addAlias("RainState", RainState.class)
.useClassMetadata(true)
.create();
DTO dto = new DTO(
new ArrayList(Arrays.asList(
new Measurement<Double>(15.5, 8500, "TEMP_SENSOR"),
new Measurement<double[]>(new double[] {
2.5,
1.5,
2.0
}, 8500, "WIND_SPEED"),
new Measurement<RainState>(RainState.LIGHT_RAIN, 8500, "RAIN_SENSOR")
)));
String json = genson.serialize(dto);
System.out.println(json);
DTO deserialized = genson.deserialize(json, DTO.class);
}
}
Numbers and Arrays worked well out-of-the-box, but the enum class is providing a bit of a challenge. In this case the serialized JSON form would have to be IMO a JSON object including a:
type member
value member
Looking at the EnumConverter class I see that I would need to provide a custom Converter. However I can't quite grasp how to properly register the Converter so that it would be called during deserialization. How should this serialization be solved using Genson?
Great for providing a complete example!
First problem is that DTO doesn't have a no arg constructor, but Genson supports classes even with constructors that have arguments. You just have to enable it via the builder with 'useConstructorWithArguments(true)'.
However this will not solve the complete problem. For the moment Genson has full polymorphic support only for types that are serialized as a json object. Because Genson will add a property called '#class' to it. There is an open issue for that.
Probably the best solution that should work with most situations would be to define a converter that automatically wraps all the values in json objects, so the converter that handles class metadata will be able to generate it. This can be a "good enough" solution while waiting for it to be officially supported by Genson.
So first define the wrapping converter
public static class LiteralAsObjectConverter<T> implements Converter<T> {
private final Converter<T> concreteConverter;
public LiteralAsObjectConverter(Converter<T> concreteConverter) {
this.concreteConverter = concreteConverter;
}
#Override
public void serialize(T object, ObjectWriter writer, Context ctx) throws Exception {
writer.beginObject().writeName("value");
concreteConverter.serialize(object, writer, ctx);
writer.endObject();
}
#Override
public T deserialize(ObjectReader reader, Context ctx) throws Exception {
reader.beginObject();
T instance = null;
while (reader.hasNext()) {
reader.next();
if (reader.name().equals("value")) instance = concreteConverter.deserialize(reader, ctx);
else throw new IllegalStateException(String.format("Encountered unexpected property named '%s'", reader.name()));
}
reader.endObject();
return instance;
}
}
Then you need to register it with a ChainedFactory which would allow you to delegate to the default converter (this way it works automatically with any other type).
Genson genson = new GensonBuilder()
.useIndentation(true)
.useConstructorWithArguments(true)
.useRuntimeType(true)
.addAlias("RainState", RainState.class)
.useClassMetadata(true)
.withConverterFactory(new ChainedFactory() {
#Override
protected Converter<?> create(Type type, Genson genson, Converter<?> nextConverter) {
if (Wrapper.toAnnotatedElement(nextConverter).isAnnotationPresent(HandleClassMetadata.class)) {
return new LiteralAsObjectConverter(nextConverter);
} else {
return nextConverter;
}
}
}).create();
The downside with this solution is that useClassMetadataWithStaticType needs to be set to true...but well I guess it is acceptable as it's an optim and can be fixed but would imply some changes in Gensons code, the rest still works.
If you are feeling interested by this problem it would be great you attempted to give a shot to that issue and open a PR to provide this feature as part of Genson.

How to convert a JSON object (returns from google places API) to a Java object

Google places API returns a JSON when it requested for a place under food category which includes the details of several places.
I want to create an object array where each object contains details of a specific place.
I have used GSON library for my implementation and it works fine for a dummy JSON object but not with the JSON result given from Google place API and 'JsonSyntaxException' is thrown.
I look for a solution for following matters..
1 How can I proceed with GSON and given JSON object to create my object array or
2 Is there any other way to accomplish my task (still using JSON result)
Thanks.
update
Class PlaceObject
import java.util.List;
public class PlaceObject {
private List<String> results;
}
Class JSONconverter
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import com.google.gson.Gson;
public class JSONconverter {
public static void main(String[] args) {
Gson gson = new Gson();
try {
BufferedReader br = new BufferedReader(
new FileReader("c:\\placeAPI.json"));
//convert the json string back to object
PlaceObject obj = gson.fromJson(br, PlaceObject.class);
//obj.results = null ,when debugged thats the problem
System.out.println("Result: " + obj);
} catch (IOException e) {
e.printStackTrace();
}
}
}
The link of JSON
http://www.mediafire.com/?8mmnuxuopimhdnz
I like working with gson
btw. there is another relevant thread
Jersey client's documentation proposes to use Jackson library (see wiki)
You can also take a look at Genson library http://code.google.com/p/genson/.
It provides an out of box integration with jersey. You only need to have the jar in your classpath.

Categories