In Spring Boot application I'm trying to configure Kafka Streams. With plain Kafka topics, everything is working fine, but I unable to get working Spring Kafka Streams.
This is my configuration:
#Configuration
#EnableKafkaStreams
public class KafkaStreamsConfig {
#Value("${spring.kafka.bootstrap-servers}")
private String bootstrapServers;
#Bean(name = KafkaStreamsDefaultConfiguration.DEFAULT_STREAMS_CONFIG_BEAN_NAME)
public StreamsConfig kStreamsConfigs() {
Map<String, Object> props = new HashMap<>();
props.put(StreamsConfig.APPLICATION_ID_CONFIG, "testStreams");
props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServers);
props.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.Integer().getClass().getName());
props.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass().getName());
props.put(StreamsConfig.DEFAULT_TIMESTAMP_EXTRACTOR_CLASS_CONFIG, WallclockTimestampExtractor.class.getName());
return new StreamsConfig(props);
}
#Bean
public KStream<String, String> kStream(StreamsBuilder kStreamBuilder) {
KStream<String, String> stream = kStreamBuilder.stream("post.sent");
stream.mapValues(post -> post.toString()).to("streamingTopic2");
stream.print();
return stream;
}
#Bean
public NewTopic kafkaTopicTest() {
return new NewTopic("streamingTopic2", 1, (short) 1);
}
#KafkaListener(topics = "streamingTopic2", containerFactory = "kafkaListenerContainerFactory")
public void testListener(ConsumerRecord<String, String> consumerRecord, Acknowledgment ack) {
String value = consumerRecord.value();
System.out.println("VALUE: " + value);
ack.acknowledge();
}
}
I want to create a stream based on post.sent topic. To apply a simple transformation and to send the messages from this stream to test streamingTopic2 topic.
Right now when I send the message into post.sent topic I unable immediately to get it in "streamingTopic2" but after my application restart it start fails with the following error:
org.apache.kafka.common.errors.SerializationException: Error deserializing key/value for partition streamingTopic2-0 at offset 0. If needed, please seek past the record to continue consumption.
Caused by: org.apache.kafka.common.errors.SerializationException: Can't deserialize data [[123, 34, 105, 100, 34, 58, 34, 53, 98, 56, 49, 53, 99, 97, 51, 52, 102, 97, 101, 102, 48, 52, 55, 97, 52, 48, 48, 100, 52, 50, 97, 34, 44, 34, 115, 116, 97, 116, 117, 115, 34, 58, 34, 83, 69, 78, 84, 34, 44, 34, 101, 120, 116, 101, 114, 110, 97, 108, 80, 111, 115, 116, 73, 100, 34, 58, 34, 48, 53, 54, 97, 57, 51, 49, 101, 45, 56, 97, 53, 100, 45, 52, 100, 52, 52, 45, 97, 101, 50, 48, 45, 53, 99, 51, 53, 52, 56, 57, 52, 98, 97, 53, 49, 34, 44, 34, 99, 104, 97, 116, 78]] from topic [streamingTopic2]
Caused by: com.fasterxml.jackson.databind.exc.MismatchedInputException: Cannot deserialize instance of `java.lang.String` out of START_OBJECT token
at [Source: (byte[])"{"id":"5b815ca34faef047a400d42a","status":"SENT","externalPostId":"056a931e-8a5d-4d44-ae20-5c354894ba51","chatName":.......":"[truncated 626 bytes]; line: 1, column: 1]
at com.fasterxml.jackson.databind.exc.MismatchedInputException.from(MismatchedInputException.java:63) ~[jackson-databind-2.9.6.jar:2.9.6]
at com.fasterxml.jackson.databind.DeserializationContext.reportInputMismatch(DeserializationContext.java:1342) ~[jackson-databind-2.9.6.jar:2.9.6]
at com.fasterxml.jackson.databind.DeserializationContext.handleUnexpectedToken(DeserializationContext.java:1138) ~[jackson-databind-2.9.6.jar:2.9.6]
at com.fasterxml.jackson.databind.DeserializationContext.handleUnexpectedToken(DeserializationContext.java:1092) ~[jackson-databind-2.9.6.jar:2.9.6]
at com.fasterxml.jackson.databind.deser.std.StringDeserializer.deserialize(StringDeserializer.java:63) ~[jackson-databind-2.9.6.jar:2.9.6]
at com.fasterxml.jackson.databind.deser.std.StringDeserializer.deserialize(StringDeserializer.java:10) ~[jackson-databind-2.9.6.jar:2.9.6]
at com.fasterxml.jackson.databind.ObjectReader._bindAndClose(ObjectReader.java:1611) ~[jackson-databind-2.9.6.jar:2.9.6]
at com.fasterxml.jackson.databind.ObjectReader.readValue(ObjectReader.java:1234) ~[jackson-databind-2.9.6.jar:2.9.6]
at org.springframework.kafka.support.serializer.JsonDeserializer.deserialize(JsonDeserializer.java:248) ~[spring-kafka-2.1.8.RELEASE.jar:2.1.8.RELEASE]
at org.springframework.kafka.support.serializer.JsonDeserializer.deserialize(JsonDeserializer.java:224) ~[spring-kafka-2.1.8.RELEASE.jar:2.1.8.RELEASE]
at org.apache.kafka.clients.consumer.internals.Fetcher.parseRecord(Fetcher.java:967) ~[kafka-clients-1.1.0.jar:na]
at org.apache.kafka.clients.consumer.internals.Fetcher.access$3300(Fetcher.java:93) ~[kafka-clients-1.1.0.jar:na]
at org.apache.kafka.clients.consumer.internals.Fetcher$PartitionRecords.fetchRecords(Fetcher.java:1144) ~[kafka-clients-1.1.0.jar:na]
at org.apache.kafka.clients.consumer.internals.Fetcher$PartitionRecords.access$1400(Fetcher.java:993) ~[kafka-clients-1.1.0.jar:na]
at org.apache.kafka.clien
To post.sent I send the following messages <String, Post> where the Post is my own complex type but I don't know right now how to translate it to <String, String> in kStream() in order to be able to consume it in testListener().
Please suggest how to make it work.
Regarding your usage of
return new DefaultKafkaConsumerFactory<>(kafkaProperties.buildConsumerProperties(), new StringDeserializer(), new JsonDeserializer<>(String.class)); in order to define the consumerFactory bean
Well, I can't say how you have Produced data into the topic, but the JSON parser is failing.
Cannot deserialize instance of `java.lang.String` out of START_OBJECT token
at [Source: (byte[])"{"id":"5b815ca34faef047a400d42a","status":"SENT","externalPostId":"056a931e-8a5d-4d44-ae20-5c354894ba51","chatName":.......":"[truncated 626 bytes]; line: 1, column: 1]
...
at org.springframework.kafka.support.serializer.JsonDeserializer.deserialize
Based on Caused by: org.apache.kafka.common.errors.SerializationException: Can't deserialize data [[123, 34, 105 ..., I would say you have at some point done a byte[] producer, rather than explicitly defined using StringSerializer or JSONSerializer during production.
You could get around your error by using new StringDeserializer() or even do no conversion at all with ByteArrayDeserializer in your consumerFactory, but then you'll still need to handle how to later parse that event into a object that you want to manipulate and extract fields from.
To use Streams you need to do something like this:
#EnableBinding(MyStreamProcessor.class)
#StreamListener
public void process(#Input("input") KTable<String,MyMessage> myMessages,
#Input("streammapping") KTable<String, StreamMapping> streamMessages) {
...
}
interface MyStreamProcessor {
#Input("input")
KTable<?, ?> input();
#Input("streammapping")
KTable <?, ?> streamMapping();
}
and then put your processing code in the method body. KStreams work the same way
Related
I'm trying to read from sqs queue in batch mode and write to a local file using Apache beam 2.34.0 and AWS beam SDK v1 which throws Illegal mutation exception.
public class SqsReader {
public void run(String[] args) {
SqsReaderOptions options = PipelineOptionsFactory.fromArgs(args).withValidation().
as(SqsReaderOptions.class);
Pipeline p = this.getPipeline(args);
p.apply(SqsIO.read().withQueueUrl(options.getSourceQueueUrl())
.withMaxNumRecords(options.getNumberOfRecords()))
.apply(ParDo.of(new SqsMessageToJson()))
.apply(TextIO.write()
.to(options.getLocalOutputLocation())
.withNumShards(options.getNumShards()));
p.run().waitUntilFinish();
}
public static void main(String[] args) throws IOException {
new SqsReader().run(args);
}
public static class SqsMessageToJson extends DoFn<Message, String> {
#ProcessElement
public void processElement(ProcessContext c) {
String message = Objects.requireNonNull(c.element()).getBody();
c.output(message);
}
}
}
I'm getting the following exception
Jan 10, 2022 11:37:05 AM org.apache.beam.sdk.util.MutationDetectors$CodedValueMutationDetector verifyUnmodifiedThrowingCheckedExceptions
WARNING: Coder of type class org.apache.beam.sdk.coders.SerializableCoder has a #structuralValue method which does not return true when the encoding of the elements is equal. Element Shard{source=org.apache.beam.sdk.io.aws.sqs.SqsUnboundedSource#5f19451c, maxNumRecords=1, maxReadTime=null}
Coder of type class org.apache.beam.sdk.coders.SerializableCoder has a #structuralValue method which does not return true when the encoding of the elements is equal. Element Shard{source=org.apache.beam.sdk.io.aws.sqs.SqsUnboundedSource#5f19451c, maxNumRecords=1, maxReadTime=null}
Exception in thread "main" org.apache.beam.sdk.util.IllegalMutationException: PTransform SqsIO.Read/Read(SqsUnboundedSource)/Read/ParMultiDo(Read) mutated value ValueWithRecordId{id=[98, 55, 50, 51, 56, 51, 102, 57, 45, 97, 52, 100, 56, 45, 52, 99, 100, 50, 45, 97, 49, 55, 49, 45, 48, 57, 100, 48, 100, 53, 50, 51, 99, 50, 54, 51], value={MessageId: b72383f9-a4d8-4cd2-a171-09d0d523c263,ReceiptHandle: AQEBj2FXnTVQ==,MD5OfBody: 38db8cbd101e4c1cfbf47e31c2aaab75,Body: {"test-key": "test-value"},Attributes: {SentTimestamp=1641794775474},MessageAttributes: {requestTimeMsSinceEpoch={StringValue: 1641794824800,StringListValues: [],BinaryListValues: [],}}}} after it was output (new value was ValueWithRecordId{id=[98, 55, 50, 51, 56, 51, 102, 57, 45, 97, 52, 100, 56, 45, 52, 99, 100, 50, 45, 97, 49, 55, 49, 45, 48, 57, 100, 48, 100, 53, 50, 51, 99, 50, 54, 51], value={MessageId: b72383f9-a4d8-4cd2-a171-09d0d523c263,ReceiptHandle: DeVRF8vQATm1f+rHIvR3eaejlRHksL1R7WE4zDT7lSwdIs9gJCYKXFXnTVQ==,MD5OfBody: 38db8cbd101e4c1cfbf47e31c2aaab75,Body: {"test-key": "test-value"},Attributes: {SentTimestamp=1641794775474},MessageAttributes: {requestTimeMsSinceEpoch={StringValue: 1641794824800,StringListValues: [],BinaryListValues: [],}}}}). Values must not be mutated in any way after being output.
at org.apache.beam.runners.direct.ImmutabilityCheckingBundleFactory$ImmutabilityEnforcingBundle.commit(ImmutabilityCheckingBundleFactory.java:137)
at org.apache.beam.runners.direct.EvaluationContext.commitBundles(EvaluationContext.java:231)
at org.apache.beam.runners.direct.EvaluationContext.handleResult(EvaluationContext.java:163)
at org.apache.beam.runners.direct.QuiescenceDriver$TimerIterableCompletionCallback.handleResult(QuiescenceDriver.java:292)
at org.apache.beam.runners.direct.DirectTransformExecutor.finishBundle(DirectTransformExecutor.java:194)
at org.apache.beam.runners.direct.DirectTransformExecutor.run(DirectTransformExecutor.java:131)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: org.apache.beam.sdk.util.IllegalMutationException: Value ValueWithRecordId{id=[98, 55, 50, 51, 56, 51, 102, 57, 45, 97, 52, 100, 56, 45, 52, 99, 100, 50, 45, 97, 49, 55, 49, 45, 48, 57, 100, 48, 100, 53, 50, 51, 99, 50, 54, 51], value={MessageId: b72383f9-a4d8-4cd2-a171-09d0d523c263,ReceiptHandle: AQEBj2KQ==,MD5OfBody: 38db8cbd101e4c1cfbf47e31c2aaab75,Body: {"test-key": "test-value"},Attributes: {SentTimestamp=1641794775474},MessageAttributes: {requestTimeMsSinceEpoch={StringValue: 1641794824800,StringListValues: [],BinaryListValues: [],}}}} mutated illegally, new value was ValueWithRecordId{id=[98, 55, 50, 51, 56, 51, 102, 57, 45, 97, 52, 100, 56, 45, 52, 99, 100, 50, 45, 97, 49, 55, 49, 45, 48, 57, 100, 48, 100, 53, 50, 51, 99, 50, 54, 51], value={MessageId: b72383f9-a4d8-4cd2-a171-09d0d523c263,ReceiptHandle: AQ==,MD5OfBody: 38db8cbd101e4c1cfbf47e31c2aaab75,Body: {"test-key": "test-value"},Attributes: {SentTimestamp=1641794775474},MessageAttributes: {requestTimeMsSinceEpoch={StringValue: 1641794824800,StringListValues: [],BinaryListValues: [],}}}}. Encoding was rO.
at org.apache.beam.sdk.util.MutationDetectors$CodedValueMutationDetector.illegalMutation(MutationDetectors.java:158)
at org.apache.beam.sdk.util.MutationDetectors$CodedValueMutationDetector.verifyUnmodifiedThrowingCheckedExceptions(MutationDetectors.java:153)
at org.apache.beam.sdk.util.MutationDetectors$CodedValueMutationDetector.verifyUnmodified(MutationDetectors.java:128)
at org.apache.beam.runners.direct.ImmutabilityCheckingBundleFactory$ImmutabilityEnforcingBundle.commit(ImmutabilityCheckingBundleFactory.java:127)
... 10 more
Caused by: org.apache.beam.sdk.util.IllegalMutationException: Value ValueWithRecordId{id=[98, 55, 50, 51, 56, 51, 102, 57, 45, 97, 52, 100, 56, 45, 52, 99, 100, 50, 45, 97, 49, 55, 49, 45, 48, 57, 100, 48, 100, 53, 50, 51, 99, 50, 54, 51], value={MessageId: b72383f9-a4d8-4cd2-a171-09d0d523c263,ReceiptHandle: AQEBj=,MD5OfBody: 38db8cbd101e4c1cfbf47e31c2aaab75,Body: {"test-key": "test-value"},Attributes: {SentTimestamp=1641794775474},MessageAttributes: {requestTimeMsSinceEpoch={StringValue: 1641794824800,StringListValues: [],BinaryListValues: [],}}}} mutated illegally, new value was ValueWithRecordId{id=[98, 55, 50, 51, 56, 51, 102, 57, 45, 97, 52, 100, 56, 45, 52, 99, 100, 50, 45, 97, 49, 55, 49, 45, 48, 57, 100, 48, 100, 53, 50, 51, 99, 50, 54, 51], value={MessageId: b72383f9-a4d8-4cd2-a171-09d0d523c263,ReceiptHandle: AQE==,MD5OfBody: 38db8cbd101e4c1cfbf47e31c2aaab75,Body: {"test-key": "test-value"},Attributes: {SentTimestamp=1641794775474},MessageAttributes: {requestTimeMsSinceEpoch={StringValue: 1641794824800,StringListValues: [],BinaryListValues: [],}}}}. Encoding was rO2Mw.
where as the same code works in apache beam 2.31.0 without any issues. What am I missing here?
This issue seems to be caused by an indeterministic coder (SerializableCoder.of(Message.class)) in combination with using the SQS reader in batch mode. Batch mode is implemented using BoundedReadFromUnboundedSource, which is known to cause issues. Usage of it is rather discouraged.
You can follow BEAM-13631 to follow progress on fixing the SQS message coder.
Currently I can't tell you what changes between 2.31 and 2.34 are triggering the issue. But possibly it might not be changes in the SQS IO itself. I'll keep investigating a bit further and hope to give an update on that later.
For now, I recommend trying a few things:
First, try avoid using batch mode (so neither setting maxNumRecords nor maxReadTime). I'm pretty confident that this fixes your issue.
Since recent versions of Beam, there's a separate module for AWS SDK v2 beam-sdks-java-io-amazon-web-services2 (hence my question above). It uses a custom message class for transfer rather than the AWS SDK one and encoding should be deterministic. However, I noticed a few other bugs on the SDK v2 IOs when starting to look into it recently: retry on invalid receipt handles, SQS clients closed too early.
Please let me know if either one helps.
The I/O is much more complicated in Beam 2.34.0 than 2.31.0. For Beam 2.34.0, the deleteBatch logic filters messages to delete based on the inflight state. However, there are assumptions in the extend logic where the inflight state is modified to exclude messages that are assumed expired or to be expired. These messages are not explicitly requested by the I/O to be deleted from sqs nor dropped by the I/O itself (the I/O could be processing a message that should have been expired to wait for it to be resent).
Filed https://issues.apache.org/jira/browse/BEAM-13627.
Though I'm not sure if pulling the same message again with a new receipt handle within the same bundle would cause the problem of mutation detection because receipt handle is part of the Message hashcode unless there is a hash collision in the mutation detector.
TL;DR: debugging process
The mutation was detected in the SqsUnboundedSource, not caused by any other code in the pipeline.
The code that reports the warning and throws the exception is here.
The only field changed is the Receipt handle. It's documented here that:
If you receive a message more than once, each time you receive it, you get a different receipt handle. You must provide the most recently received receipt handle when you request to delete the message (otherwise, the message might not be deleted).
There is no aws_java_sdk_version change between Beam 2.31.0 and Beam 2.34.0. So AWS SDK shouldn't be the culprit.
There is a significant change between Beam 2.31.0 and Beam 2.34.0 for SqsUnboundedReader.
To receive a message more than once, the message must not have been deleted since the first time received. The deletion logic is invoked in SqsCheckpointMark.
I have an application that consumes messages in a protobuf format and when I run it I am getting this error:
Exception in thread "NotificationProcessorService-process-applicationId-0300a3f8-6dab-4f3f-a631-8719178823ce-StreamThread-1" org.apache.kafka.streams.errors.StreamsException: Deserialization exception handler is set to fail upon a deserialization error. If you would rather have the streaming pipeline continue after a deserialization error, please set the default.deserialization.exception.handler appropriately.
at org.apache.kafka.streams.processor.internals.RecordDeserializer.deserialize(RecordDeserializer.java:80)
at org.apache.kafka.streams.processor.internals.RecordQueue.updateHead(RecordQueue.java:176)
at org.apache.kafka.streams.processor.internals.RecordQueue.addRawRecords(RecordQueue.java:112)
at org.apache.kafka.streams.processor.internals.PartitionGroup.addRawRecords(PartitionGroup.java:185)
at org.apache.kafka.streams.processor.internals.StreamTask.addRecords(StreamTask.java:865)
at org.apache.kafka.streams.processor.internals.TaskManager.addRecordsToTasks(TaskManager.java:938)
at org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:640)
at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:551)
at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:510)
Caused by: org.apache.kafka.common.errors.SerializationException: Can't deserialize data [[0, 0, 0, 0, 5, 0, 10, 8, 57, 53, 52, 50, 56, 51, 51, 51, 16, -7, -12, -106, -97, -119, 47, 26, 6, 57, 56, 55, 56, 54, 55, 34, 4, 56, 54, 50, 50, 42, 6, 56, 57, 55, 51, 50, 57, 50, 5, 80, 82, 73, 77, 69, 58, 5, 56, 55, 57, 50, 51, 65, 31, -123, -21, 81, -72, 93, -108, 64, 72, 2, 82, 6, 67, 82, 69, 68, 73, 84, 89, 31, -123, -21, 81, -72, 93, -108, 64, 97, -26, -48, 34, -37, -7, 74, 64, 64, 105, -26, -48, 34, -37, -7, 74, 64, 64, 113, -26, -48, 34, -37, -7, 74, 64, 64, 122, 4, 77, 65, 73, 76]] from topic [pos-proto-topic]
Caused by: java.io.CharConversionException: Invalid UTF-32 character 0x4ff0a08 (above 0x0010ffff) at char #1, byte #7)
at com.fasterxml.jackson.core.io.UTF32Reader.reportInvalid(UTF32Reader.java:195)
at com.fasterxml.jackson.core.io.UTF32Reader.read(UTF32Reader.java:158)
at com.fasterxml.jackson.core.json.ReaderBasedJsonParser._loadMore(ReaderBasedJsonParser.java:250)
at com.fasterxml.jackson.core.json.ReaderBasedJsonParser._skipWSOrEnd(ReaderBasedJsonParser.java:2384)
at com.fasterxml.jackson.core.json.ReaderBasedJsonParser.nextToken(ReaderBasedJsonParser.java:672)
at com.fasterxml.jackson.databind.ObjectReader._initForReading(ObjectReader.java:357)
at com.fasterxml.jackson.databind.ObjectReader._bindAndClose(ObjectReader.java:2064)
at com.fasterxml.jackson.databind.ObjectReader.readValue(ObjectReader.java:1555)
at org.springframework.kafka.support.serializer.JsonDeserializer.deserialize(JsonDeserializer.java:517)
at org.apache.kafka.streams.processor.internals.SourceNode.deserializeValue(SourceNode.java:55)
at org.apache.kafka.streams.processor.internals.RecordDeserializer.deserialize(RecordDeserializer.java:66)
at org.apache.kafka.streams.processor.internals.RecordQueue.updateHead(RecordQueue.java:176)
at org.apache.kafka.streams.processor.internals.RecordQueue.addRawRecords(RecordQueue.java:112)
at org.apache.kafka.streams.processor.internals.PartitionGroup.addRawRecords(PartitionGroup.java:185)
at org.apache.kafka.streams.processor.internals.StreamTask.addRecords(StreamTask.java:865)
at org.apache.kafka.streams.processor.internals.TaskManager.addRecordsToTasks(TaskManager.java:938)
at org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:640)
at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:551)
at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:510)
I think the application expects a Json message by default and I need to change some configuration saying "hey, I am waiting for protobuf messages here". I feel I searched by the whole internet and didn't find how to set it.
Here is my application.yaml file:
spring:
cloud:
stream:
bindings:
notification-input-channel:
destination: pos-proto-topic
notification-output-channel:
destination: notification-topic
kafka:
streams:
binder:
brokers: localhost:9092
configuration:
schema.registry.url: http://localhost:8081
bindings:
notification-output-channel:
producer:
valueSerde: io.confluent.kafka.streams.serdes.protobuf.KafkaProtobufSerde
I am also using the Hoxton.SR9 as spring-cloud.version. Does anyone know how to solve this?
You need to set:
spring.cloud.stream.kafka.streams.bindings.<channel-name>-in-0.consumer.valueSerde
I've been trying to figure out why Base64 decoding is different between Dart and Java.
Dart example code
import 'dart:convert';
var str = '640gPKMxZZbeLDIUeXiZmg==';
var dec = base64.decode(str);
print(dec);
prints: [235, 141, 32, 60, 163, 49, 101, 150, 222, 44, 50, 20, 121, 120, 153, 154]
Java example code
import java.util.Base64;
String str = "640gPKMxZZbeLDIUeXiZmg==";
byte[] dec = Base64.getDecoder().decode(str);
System.out.println(Arrays.toString(dec));
prints: [-21, -115, 32, 60, -93, 49, 101, -106, -34, 44, 50, 20, 121, 120, -103, -102]
Any ideas? As far as I'm aware they both implement RFC4648.
For the Dart code, I did try using base64url and the normalize function which didn't change anything (to be expected I suppose). Not too sure what else to try.
I've got a string that I'm trying to convert to bytes in order to create an md5 hash in both ObjC and Java. For some reason, the bytes are different between the two languages.
Java
System.out.println(Arrays.toString(
("78b4a02fa139a2944f17b4edc22fb175:8907f3c4861140ad84e20c8e987eeae6").getBytes()));
Output:
[55, 56, 98, 52, 97, 48, 50, 102, 97, 49, 51, 57, 97, 50, 57, 52, 52, 102, 49, 55, 98, 52, 101, 100, 99, 50, 50, 102, 98, 49, 55, 53, 58, 56, 57, 48, 55, 102, 51, 99, 52, 56, 54, 49, 49, 52, 48, 97, 100, 56, 52, 101, 50, 48, 99, 56, 101, 57, 56, 55, 101, 101, 97, 101, 54]
ObjC
NSString *str = #"78b4a02fa139a2944f17b4edc22fb175:8907f3c4861140ad84e20c8e987eeae6";
NSData *bytes = [str dataUsingEncoding:NSISOLatin1StringEncoding allowLossyConversion:NO];
NSLog(#"%#", [bytes description]);
Output:
<37386234 61303266 61313339 61323934 34663137 62346564 63323266 62313735 3a383930 37663363 34383631 31343061 64383465 32306338 65393837 65656165 36>
I've tried using different charsets with no luck and can't think of any other reasons why the bytes would be different. Any ideas? I did notice that all of the byte values are different by some factor of 18 but am not sure what is causing it.
Actually, Java is printing in decimal, byte by byte. Obj C is printing in hex, integer by integer.
Referring this chart:
Dec Hex
55 37
56 38
98 62
...
You'll just have to find a way to output byte by byte in Obj C.
I don't know about Obj C, but if that NSLog function works similar to printf() in C, I'd start with that.
A code snippet from Apple
unsigned char aBuffer[20];
NSString *myString = #"Test string.";
const char *utfString = [myString UTF8String];
NSData *myData = [NSData dataWithBytes: utfString length: strlen(utfString)];
[myData getBytes:aBuffer length:20];
The change in bytes can be due to Hex representation. The above code shows how to convert the string to bytes and store the result in a buffer.
I have a string encoded in Base64:
eJx9xEERACAIBMBKJyKDcTzR_hEsgOxjAcBQFVVNvi3qEsrRnWXwbhHOmzWnctPHPVkPu-4vBQ==
How can I decode it in Scala language?
I tried to use:
val bytes1 = new sun.misc.BASE64Decoder().decodeBuffer(compressed_code_string)
But when I compare the byte array with the correct one that I generated in Python language, there is an error. Here is the command I used in python:
import base64
base64.urlsafe_b64decode(compressed_code_string)
The Byte Array in Scala is:
(120, -100, 125, -60, 65, 17, 0, 32, 8, 4, -64, 74, 39, 34, -125, 113, 60, -47, -2, 17, 44, -128, -20, 99, 1, -64, 80, 21, 85, 77, -66, 45, -22, 18, -54, -47, -99, 101, -16, 110, 17, -50, -101, 53, -89, 114, -45, -57, 61, 89, 15, -69, -2, 47, 5)
And the one generated in python is:
(120, -100, 125, -60, 65, 17, 0, 32, 8, 4, -64, 74, 39, 34, -125, 113, 60, -47, -2, 17, 44, -128, -20, 99, 1, -64, 80, 21, 85, 77, -66, 45, -22, 18, -54, -47, -99, 101, -16, 110, 17, -50, -101, 53, -89, 114, -45, -57, 61, 89, 15, -69, -18, 47, 5)
Note that there is a single difference in the end of the array
In Scala, Encoding a String to Base64 and decoding back to the original String using Java APIs:
import java.util.Base64
import java.nio.charset.StandardCharsets
scala> val bytes = "foo".getBytes(StandardCharsets.UTF_8)
bytes: Array[Byte] = Array(102, 111, 111)
scala> val encoded = Base64.getEncoder().encodeToString(bytes)
encoded: String = Zm9v
scala> val decoded = Base64.getDecoder().decode(encoded)
decoded: Array[Byte] = Array(102, 111, 111)
scala> val str = new String(decoded, StandardCharsets.UTF_8)
str: String = foo
There is unfortunately not just one Base64 encoding. The - character doesn't have the same representation in all encodings. For example, in the MIME encoding, it's not used at all. In the encoding for URLs, it is a value of 62--and this is the one that Python is using. The default sun.misc decoder wants + for 62. If you change the - to +, you get the correct answer (i.e. the Python answer).
In Scala, you can convert the string s to MIME format like so:
s.map{ case '-' => '+'; case '_' => '/'; case c => c }
and then the Java MIME decoder will work.
Both Python and Java are correct in terms of the decoding. They are just using a different RFC for this purpose. Python library is using RFC 3548 and the used java library is using RFC 4648 and RFC 2045.
Changing the hyphen(-) into a plus(+) from your input string will make the both decoded byte data are similar.