How to pass long in logstash avro plugin? - java

I'm using logstash avro plugin.
My client is a java application. I have few schemas that use 'long' as a type and each time when I send them I see wrong value after deserialization. I suppose that there's some overflow in logstash avro plugin.
Is there any workarounds for it? I don't want to send string each time when i have big value...
Here's the code snippets for my case. I have a valid .avsc schema with such field:
{
"name": "scoringId",
"type": "long",
},
And then i have avro-generated DTO on java side, which i convert to ByteArray.
My kafka config is ok, it uses ByteArraySerializer:
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, ByteArraySerializer::class.java)
In logstash conf i have such input:
input {
kafka {
bootstrap_servers => 'kafkaserver:9092'
topics => ["bart.vector"]
codec => avro { schema_uri => "C:\logstash-6.1.2\vectorInfoDWH.avsc" }
client_id => "logstash-vector-tracking"
}
}
It uses avro plugin. And as a result i can access all of the fields and get correct values except longs(and timestamps, because they're translated as longs).
Any ideas?

The problem was with serializationDeserialization. Convertion of message to base64 from java side was a solution.

Related

Spring WebClient does not decode application/octet-stream into File object

Hi I am using OpenAPI Generator Maven Plugin to generate some Java Client code (using Spring WebClient library). One of the endpoints of my spec. returns binary content, like:
"schema": {
"type": "string",
"format": "binary"
}
The generated code uses java.io.File as the return type for that, like:
public Mono<ResponseEntity<File>> downloadWithHttpInfo(String filename) throws WebClientResponseException {
ParameterizedTypeReference<File> localVarReturnType = new ParameterizedTypeReference<File>() {};
return downloadRequestCreation(filename).toEntity(localVarReturnType);
}
When calling this generated method, the response code was 200 (i.e. OK from the server side), but I got the following error in my client code:
org.springframework.web.reactive.function.UnsupportedMediaTypeException:
Content type 'application/octet-stream' not supported for bodyType=java.io.File
This came from the toEntity() method, which is part of the Spring WebClient code instead of my code.
Is there a way to workaround this? A: Instruct OpenAPI Generator Maven Plugin not to use java.io.File type but use Resource type? B: Somehow make WebClient able to decode application/octet-stream into java.io.File?
Found a solution: add the following options to the OpenAPI Generator Maven Plugin then generate the code again, which would replace File to Resource
<generatorName>java</generatorName>
<library>webclient</library>
<typeMappings>string+binary=Resource</typeMappings>
<importMappings>Resource=org.springframework.core.io.Resource</importMappings>
The above is saying when the return type is string and the format is binary, map it to Resource and for Resource import it as org.springframework.core.io.Resource. There you go.
I had the exact same issue, but using Gradle instead of Maven.
Here is the syntax for doing the same in Gradle:
task generateClientSources(type: org.openapitools.generator.gradle.plugin.tasks.GenerateTask) {
generatorName = 'java'
// other configs ..
configOptions = [
// other configs ..
library : 'webclient'
]
typeMappings = [
File : 'Resource'
]
importMappings = [
File : 'org.springframework.core.io.Resource'
]
}

Consume the Json in kafka topic using tJava in Talend

I am currently trying to create an ingestion job workflow using kafka in Talend Studio. The job will read the json data in topic "work" and store into the hive table.
Snippet of json:
{"Header": {"Vers":"1.0","Message": "318","Owner": {"ID": 102,"FID": 101},"Mode":"8"},"Request": {"Type":"4","ObjType": "S","OrderParam":[{"Code": "OpType","Value": "30"},{"Code": "Time","Value": "202"},{"Code": "AddProperty","ObjParam": [{"Param": [{"Code": "Sync","Value": "Y"}]}]}]}}
{"Header": {"Vers":"2.0","Message": "318","Owner": {"ID": 103,"FID": 102},"Mode":"8"},"Request": {"Type":"5","ObjType": "S","OrderParam":[{"Code": "OpType","Value": "90"},{"Code": "Time","Value": "203"},{"Code": "AddProperty","ObjParam": [{"Param": [{"Code": "Sync","Value": "Y"}]}]}]}}
Talend workflow:
My focus in this question is not the talend component. But the java code in tJava component that uses the java to fetch and read the json.
Java code:
String output=((String)globalMap.get("tLogRow_1_OUTPUT"));
JSONObject jsonObject = new JSONObject(output);
System.out.println(jsonObject);
String sourceDBName=(jsonObject.getString("Vers"));
The code above able to get the data from tLogRow in "output" variable. However, it gives an error where it read null value for json object. What should I do to correctly get the data from json accordingly?
You can use a tExtractJsonFields instead of a tJava. This component extracts data from your input String following a json schema that you can define in metadata. With this you could extract all fields from your input .

How to receive raw json string from Rabbit in Java without any modifications?

I have a spring boot app configured with a RabbitMqListener. It has to receive JSON data of the format below: (showing sample)
{ "name" :"abc",
"key" : "somekey",
"value" : {"data": {"notes": "**foo \u0026 bar"}}**
}
This data represents some info. which should be used only for read-only processing, and the receiver spring app should receive it as it is(raw form).
What I mean is if I assert value node in spring app with input that was published on queue it should be equal.
This is simply not happening.
I always get the value in spring app as
foo & bar but I wanted it in raw form without a conversation of \u codes.
I try several approaches,
Jackaon2JsonMessage converter,
passing bytes from Message.getBody() - byte[] to mapper.readValue() in Rabbit handler.
Using JSON-simple, gson libraries
Why is it so tricky to get the data as it is, without any conversion or translation.
Do I need to follow an alternative approach?
Any help is appreciated
Have you tried explicitly enabling the escaping of non-ascii characters on your ObjectMapper?
mapper.getFactory().configure(JsonGenerator.Feature.ESCAPE_NON_ASCII, true);

How to define parquet schema for ParquetOutputFormat for Hadoop job in java?

I have a Hadoop job in java, which has sequence output format:
job.setOutputFormatClass(SequenceFileOutputFormat.class);
I want to use Parquet format instead. I tried to set it in the naive way:
job.setOutputFormatClass(ParquetOutputFormat.class);
ParquetOutputFormat.setOutputPath(job, output);
ParquetOutputFormat.setCompression(job, CompressionCodecName.GZIP);
ParquetOutputFormat.setCompressOutput(job, true);
But when in comes to writing job's result to disk, the bob fails:
Error: java.lang.NullPointerException: writeSupportClass should not be null
at parquet.Preconditions.checkNotNull(Preconditions.java:38)
at parquet.hadoop.ParquetOutputFormat.getWriteSupport(ParquetOutputFormat.java:326)
It seems, that parquet needs a schema te be set, but I couldn't find ane manual or guide, how to do that in my case.
My Reducer class tries to write down 3 long values on each line by using org.apache.hadoop.io.LongWritable as a key and org.apache.mahout.cf.taste.hadoop.EntityEntityWritable as a value.
How can I define a schema for that?
You have to specify a "parquet.hadoop.api.WriteSupport" impelementation for your job.
(ex: "parquet.proto.ProtoWriteSupport" for protoBuf or "parquet.avro.AvroWriteSupport" for avro)
ParquetOutputFormat.setWriteSupportClass(job, ProtoWriteSupport.class);
when using protoBuf, then specify protobufClass:
ProtoParquetOutputFormat.setProtobufClass(job, your-protobuf-class.class);
and when using avro, introduce schema like this:
AvroParquetOutputFormat.setSchema(job, your-avro-object.SCHEMA);

PIG: Cannot cast java.lang.String to org.apache.avro.util.Utf8 with AvroStorage inside STORE

I am using Apache PIG to reduce data originally stored in CSV format and want to output in Avro. Part of my PIG script calls a java UDF that appends a few fields to the input Tuple and passes the modified Tuple back. I am modifying the output, PIG, schema when doing this using:
Schema outSchema = new Schema(input).getField(1).schema;
Schema recSchema = outSchema.getField(0).schema;
recSchema.add(new FieldSchema("aircrafttype", DataType.CHARARRAY));
Inside the public Schema outputSchema(Schema input) method of my UDF.
Within the exec method, I append java.lang.String values to the input Tuple and return the edited Tuple to the PIG script. This, and all subsequent operations work fine. If I output to CSV format using PigStorage(',') there are no problems. When I attempt to output using
STORE records INTO '$out_dir' USING org.apache.pig.piggybank.storage.avro.AvroStorage('
{
"schema":{
"type":"record", "name":"my new data",
"fields": [
{"name":"fld1", "type":"long"},
{"name":"fld2", "type":"string"}
]}
}');
I get the following error:
java.io.IOException: java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.avro.util.Utf8
I have attempted appending the character fields to the Tuple (within my UDF) as char[] and Utf8 types, but that makes PIG angry before I even get to trying to write out data. I have also attempted modifying my Avro schema to allow for null types in every field.
I'm using PIG v0.11.1 and Avro v1.7.5, any help is much appreciated.
This was a PIG version issue. My UDF was built into a jar-with-dependencies including PIG v0.8.1. The mix of PIG versions 0.8.1 and 0.11.1 was causing the problems, AVRO had nothing to do with it.

Categories