NPE while deserializing avro messages in kafka streams - nullpointerexception

I wrote a small java class to test the consumption of Avro encoded Kafka topic.
Properties appProps = new Properties();
appProps.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "http://***kfk14bro1.lc:9092");
appProps.put(AbstractKafkaAvroSerDeConfig.SCHEMA_REGISTRY_URL_CONFIG, "http://***kfk14str1.lc:8081");
appProps.put(StreamsConfig.APPLICATION_ID_CONFIG, "consumer");
appProps.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "latest");
appProps.put(StreamsConfig.DEFAULT_DESERIALIZATION_EXCEPTION_HANDLER_CLASS_CONFIG,LogAndContinueExceptionHandler.class);
StreamsBuilder streamsBuilder = new StreamsBuilder();
streamsBuilder.stream(
"coordinates", Consumed.with(Serdes.String(), new GenericAvroSerde()))
.peek((key, value) -> System.out.println("key=" + key + ", value=" + value));
new KafkaStreams(streamsBuilder.build(), appProps).start();
When I run this class, SerdeConfigs are being logged alright which can be seen in the below log:
[consumer-56b0e0ca-d336-45cc-b388-46a68dbfab8b-StreamThread-1] INFO io.confluent.kafka.serializers.KafkaAvroSerializerConfig - KafkaAvroSerializerConfig values:
schema.registry.url = [http://***kfk14str1.lc:8081]
basic.auth.user.info = [hidden]
auto.register.schemas = true
max.schemas.per.subject = 1000
basic.auth.credentials.source = URL
schema.registry.basic.auth.user.info = [hidden]
value.subject.name.strategy = class io.confluent.kafka.serializers.subject.TopicNameStrategy
key.subject.name.strategy = class io.confluent.kafka.serializers.subject.TopicNameStrategy
[normal-consumer-56b0e0ca-d336-45cc-b388-46a68dbfab8b-StreamThread-1] INFO io.confluent.kafka.serializers.KafkaAvroDeserializerConfig - KafkaAvroDeserializerConfig values:
schema.registry.url = [http://***kfk14str1.lc:8081]
basic.auth.user.info = [hidden]
auto.register.schemas = true
max.schemas.per.subject = 1000
basic.auth.credentials.source = URL
schema.registry.basic.auth.user.info = [hidden]
specific.avro.reader = false
value.subject.name.strategy = class io.confluent.kafka.serializers.subject.TopicNameStrategy
key.subject.name.strategy = class io.confluent.kafka.serializers.subject.TopicNameStrategy
but messages are not being consumed and generates the below log for every message:
[normal-consumer-56b0e0ca-d336-45cc-b388-46a68dbfab8b-StreamThread-1] WARN org.apache.kafka.streams.errors.LogAndContinueExceptionHandler - Exception caught during Deserialization, taskId: 0_0, topic: coordinates, partition: 0, offset: 782205986
org.apache.kafka.common.errors.SerializationException: Error deserializing Avro message for id 83
Caused by: java.lang.NullPointerException
at io.confluent.kafka.serializers.AbstractKafkaAvroDeserializer.deserialize(AbstractKafkaAvroDeserializer.java:116)
at io.confluent.kafka.serializers.AbstractKafkaAvroDeserializer.deserialize(AbstractKafkaAvroDeserializer.java:88)
at io.confluent.kafka.serializers.KafkaAvroDeserializer.deserialize(KafkaAvroDeserializer.java:55)
at io.confluent.kafka.streams.serdes.avro.GenericAvroDeserializer.deserialize(GenericAvroDeserializer.java:63)
at io.confluent.kafka.streams.serdes.avro.GenericAvroDeserializer.deserialize(GenericAvroDeserializer.java:39)
at org.apache.kafka.common.serialization.Deserializer.deserialize(Deserializer.java:58)
at org.apache.kafka.streams.processor.internals.SourceNode.deserializeValue(SourceNode.java:60)
But I am able to read just fine from the avro console consumer, so I know there is nothing wrong with the data written to the topic. Below command prints logs alright:
~/kafka/confluent-5.1.2/bin/kafka-avro-console-consumer --bootstrap-server http://***kfk14bro1.lc:9092 --topic coordinates --property schema.registry.url=http://***kfk14str1.lc:8081 --property auto.offset.reset=latest

When you instantiate an Avro Serde yourself it is not configured automatically with the schema-registry URL.
So either you have to configure it yourself or you define default serdes by adding:
appProps.setProperty(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass().getName());
appProps.setProperty(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, GenericAvroSerde.class.getName());
And by removing
Consumed.with(Serdes.String(), new GenericAvroSerde())
To configure Serde use following code (adapt it to your situation):
GenericAvroSerde genericAvroSerde = new GenericAvroSerde();
boolean isKeySerde = false;
genericAvroSerde.configure(
Collections.singletonMap(
AbstractKafkaAvroSerDeConfig.SCHEMA_REGISTRY_URL_CONFIG,
"http://confluent-schema-registry-server:8081/"),
isKeySerde);

Related

Confluent Generalized S3 Source connector (silent failure)

Running into strange issue with Generalized S3 source connector running on confluent platform. Not able to pin point where exactly the error is or what the root cause is :
The only error I see in the ssh console is this (related to logs) :
[2023-02-11 11:12:45,464] INFO [Worker clientId=connect-1, groupId=connect-cluster-1] Finished starting connectors and tasks (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1709)
log4j:ERROR A "io.confluent.log4j.redactor.RedactorAppender" object is not assignable to a "org.apache.log4j.Appender" variable.
log4j:ERROR The class "org.apache.log4j.Appender" was loaded by
log4j:ERROR [PluginClassLoader{pluginLocation=file:/usr/share/java/source-2.5.1/}] whereas object of type
log4j:ERROR "io.confluent.log4j.redactor.RedactorAppender" was loaded by [jdk.internal.loader.ClassLoaders$AppClassLoader#251a69d7].
log4j:ERROR Could not instantiate appender named "redactor".
[2023-02-11 11:13:35,741] INFO Injecting Confluent license properties into connector '<unspecified>' (org.apache.kafka.connect.runtime.WorkerConfigDecorator:412)
[2023-02-11 11:13:44,001] INFO Injecting Confluent license properties into connector 'S3GenConnectorConnector_7' (org.apache.kafka.connect.runtime.WorkerConfigDecorator:412)
[2023-02-11 11:13:44,006] INFO S3SourceConnectorConfig values:
aws.access.key.id = <<ACCESS KEY HERE>>
aws.secret.access.key = [hidden]
behavior.on.error = fail
bucket.listing.max.objects.threshold = -1
confluent.license = [hidden]
confluent.topic = _confluent-command
confluent.topic.bootstrap.servers = [172.27.157.66:9092]
confluent.topic.replication.factor = 3
directory.delim = /
file.discovery.starting.timestamp = 0
filename.regex = (.+)\+(\d+)\+.+$
folders = []
format.bytearray.extension = .bin
format.bytearray.separator =
format.class = class io.confluent.connect.s3.format.string.StringFormat
format.json.schema.enable = false
mode = RESTORE_BACKUP
parse.error.topic.prefix = error
partition.field.name = []
partitioner.class = class io.confluent.connect.storage.partitioner.DefaultPartitioner
path.format =
record.batch.max.size = 200
s3.bucket.name = mytestbucketamtk
s3.credentials.provider.class = class com.amazonaws.auth.DefaultAWSCredentialsProviderChain
s3.http.send.expect.continue = true
s3.part.retries = 3
s3.path.style.access = true
s3.poll.interval.ms = 60000
s3.proxy.password = null
s3.proxy.url =
s3.proxy.username = null
s3.region = us-east-1
s3.retry.backoff.ms = 200
s3.sse.customer.key = null
s3.ssea.name =
s3.wan.mode = false
schema.cache.size = 50
store.url = null
task.batch.size = 10
topic.regex.list = [first_topic:.*]
topics.dir = topics
(io.confluent.connect.s3.source.S3SourceConnectorConfig:376)
[2023-02-11 11:13:44,029] INFO Using configured AWS access key credentials instead of configured credentials provider class. (io.confluent.connect.s3.source.S3Storage:500)
Connector Config file below :
{
"name": "S3GenConnectorConnector_7",
"config": {
"connector.class": "io.confluent.connect.s3.source.S3SourceConnector",
"tasks.max": "1",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"value.converter": "org.apache.kafka.connect.storage.StringConverter",
"mode": "RESTORE_BACKUP",
"format.class": "io.confluent.connect.s3.format.string.StringFormat",
"s3.bucket.name": "mytestbucketamtk",
"s3.region": "us-east-1",
"aws.access.key.id": <<ACCESS KEY HERE>>,
"aws.secret.access.key": <<SECRET KEY>>,
"topic.regex.list":"first_topic:.*"
}
}
The tasks are not getting created. And also no other errors in the connect console. The connect cluster is running on confluent platform. Any pointers in right direction would be appreciated. Did I miss any required configuration ?

log4j2.properties syslog is not wrking

My java application problem is that log4j2 syslog is written not in 'local1.log' but
'messages'.
My /etc/rsyslog.conf is configured 'local1.* /var/log/local1.log' in /etc/rsyslog.conf.
But One of weired is when I removed 'appender.syslog.layout.type' and 'appender.syslog.layout.pattern' from log4j2.properties, syslog starts being written in /var/log/local1.log correctly.
Is my configuration incorrect?
Are layout properties not applied in syslog?
[/etc/rsyslog.conf]
# Log anything (except mail) of level info or higher.
# Don't log private authentication messages!
*.info;mail.none;authpriv.none;cron.none;local1.none /var/log/messages
# The authpriv file has restricted access.
authpriv.* /var/log/secure
...
local1.* /var/log/local1.log
[Used log4j2 library]
log4j-api-2.17.2.jar
log4j-core-2.17.2.jar
[log4j2.properties]
status = warn
name = Test
# Console appender configuration
appender.console.type = Console
appender.console.name = consoleLogger
appender.console.layout.type = PatternLayout
appender.console.layout.pattern = %d{HH:mm:ss} %5p (%c{1} - %M:%L) - %m%n
appender.syslog.type = Syslog
appender.syslog.name = sysLogger
appender.syslog.host = localhost
appender.syslog.port = 514
appender.syslog.protocol = UDP
appender.syslog.facility = LOCAL1
appender.syslog.layout.type = PatternLayout
appender.syslog.layout.pattern = %c{1} (%M:%L) %m\n
# Root logger level
rootLogger.level = debug
rootLogger.appenderRefs = consoleLogger, sysLogger
rootLogger.appenderRef.stdout.ref = consoleLogger
rootLogger.appenderRef.syslog.ref = sysLogger
Log4j2's syslog layout is used to format the entire syslog message and must therefore be one of SyslogLayout (traditional BSD syslog format) or Rfc5424Layout (modern syslog layout). Using any other layout will result in invalid messages and RSyslog will have to guess the message's metadata. Most notably the facility will be set to USER.
If you want to send additional data to syslog, beyond %m, you should use the RFC5424 format and send the additional information as structured data. For example you can use (in XML format):
<Syslog name="sysLogger" host="localhost" port="514" protocol="UDP">
<Rfc5424Layout appName="MyApp" facility="LOCAL1">
<LoggerFields enterpriseId="32473" sdId="location">
<KeyValuePair key="logger" value="%c" />
<KeyValuePair key="class" value="%C" />
<KeyValuePair key="method" value="%M" />
<KeyValuePair key="line" value="%L" />
</LoggerFields>
</Rfc5424Layout>
</Syslog>
which translates to the properties format as:
appender.syslog.type = Syslog
appender.syslog.name = sysLogger
appender.syslog.host = localhost
appender.syslog.port = 514
appender.syslog.protocol = UDP
appender.syslog.layout.type = Rfc5424Layout
appender.syslog.layout.facility = LOCAL1
appender.syslog.layout.appName = MyApp
appender.syslog.layout.fields.type = LoggerFields
appender.syslog.layout.fields.enterpriseId = 32473
appender.syslog.layout.fields.sdId = location
appender.syslog.layout.fields.0.type = KeyValuePair
appender.syslog.layout.fields.0.key = logger
appender.syslog.layout.fields.0.value = %c
...
Virtually all modern syslog servers can interpret structured data. For RSyslog you need to:
Enable structured data parsing:
module(load="mmpstrucdata")
action(type="mmpstrucdata")
Create a template to format your message:
template(name="MyAppFormat" type="list") {
property(name="timereported" dateFormat="rfc3339")
constant(value=" ")
property(name="hostname")
constant(value=" ")
property(name="syslogtag")
constant(value=" ")
property(name="$!rfc5424-sd!location#32473!class")
constant(value=" (")
property(name="$!rfc5424-sd!location#32473!method")
constant(value=":")
property(name="$!rfc5424-sd!location#32473!line")
constant(value=") ")
property(name="msg" droplastlf="on")
constant(value="\n")
}
Use the template for messages coming from "MyApp":
:app-name, isequal, "MyApp" {
/var/log/myapp.log;MyAppFormat
stop
}

spring-cloud-stream3.0 batch-mode is true consumer data Can't convert value of class

config info
spring:
kafka:
consumer:
max-poll-records: 5
cloud:
stream:
instance-count: 5
instance-index: 0
kafka:
binder:
brokers: 127.0.0.1:9092
auto-create-topics: true
auto-add-partitions: true
min-partition-count: 5
bindings:
log-data-in:
destination: log-data1
group: log-data-group
contenttype: text/plain
consumer:
partitioned: true
batch-mode: true
log-data-out:
destination: log-data1
contentType: text/plain
producer:
use-native-encoding: true
partitionCount: 5
# configuration:
# key.serializer: org.apache.kafka.common.serialization.StringSerializer
# value.serializer: org.apache.kafka.common.serialization.ByteArraySerializer
send kafka code:
LogData logData = new LogData();
logData.setId(1)
logData.setVer("22")
MessageChannel messageChannel = logDataStreams.outboundLogDataStreams();
boolean sent = messageChannel.send(MessageBuilder.withPayload(logData).setHeader("partitionKey", key).build());
consumer data listener kafka topic
I think there is an error in processing the data type in kafka, there is no problem, there is no less setting, why is the conversion type error when consuming kafka information, bytes cannot know to convert an entity class
The code here is the consumption data, which is the place where the error is reported
#StreamListener(LogDataStreams.INPUT_LOG_DATA)
public void handleLogData(List<LogData> messages) {
System.out.println(messages);
messages.parallelStream().forEach(item -> {
System.out.println(item);
});
}
Define topic identifier
public interface LogDataStreams {
String INPUT_LOG_DATA = "log-data-in";
String OUTPUT_LOG_DATA = "log-data-out";
String INPUT_LOG_SC = "log-sc-in";
String OUTPUT_LOG_SC = "log-sc-out";
String INPUT_LOG_BEHAVIOR = "log-behavior-in";
String OUTPUT_LOG_BEHAVIOR = "log-behavior-out";
#Input(INPUT_LOG_DATA)
SubscribableChannel inboundLogDataStreams();
#Output(OUTPUT_LOG_DATA)
MessageChannel outboundLogDataStreams();
}
I think there is an error in processing the data type in kafka, there is no problem, there is no less setting, why is the conversion type error when consuming kafka information, bytes cannot know to convert an entity class
Error in last run:
Caused by: org.apache.kafka.common.errors.SerializationException: Can't convert value of class com.xx.core.data.model.LogData to class org.apache.kafka.common.serialization.ByteArraySerializer specified in value.serializer
Caused by: java.lang.ClassCastException: com.xx.core.data.model.LogData cannot be cast to [B
Please help me, how to solve this problem

what is correct use of consumer groups in spring cloud stream dataflow and rabbitmq?

A follow up to this:
one SCDF source, 2 processors but only 1 processes each item
The 2 processors (del-1 and del-2) in the picture are receiving the same data within milliseconds of each other. I'm trying to rig this so del-2 never receives the same thing as del-1 and vice versa. So obviously I've got something configured incorrectly but I'm not sure where.
My processor has the following application.properties
spring.application.name=${vcap.application.name:sample-processor}
info.app.name=#project.artifactId#
info.app.description=#project.description#
info.app.version=#project.version#
management.endpoints.web.exposure.include=health,info,bindings
spring.autoconfigure.exclude=org.springframework.boot.autoconfigure.security.servlet.SecurityAutoConfiguration
spring.cloud.stream.bindings.input.group=input
Is "spring.cloud.stream.bindings.input.group" specified correctly?
Here's the processor code:
#Transformer(inputChannel = Processor.INPUT, outputChannel = Processor.OUTPUT)
public Object transform(String inputStr) throws InterruptedException{
ApplicationLog log = new ApplicationLog(this, "timerMessageSource");
String message = " I AM [" + inputStr + "] AND I HAVE BEEN PROCESSED!!!!!!!";
log.info("SampleProcessor.transform() incoming inputStr="+inputStr);
return message;
}
Is the #Transformer annotation the proper way to link this bit of code with "spring.cloud.stream.bindings.input.group" from application.properties? Are there any other annotations necessary?
Here's my source:
private String format = "EEEEE dd MMMMM yyyy HH:mm:ss.SSSZ";
#Bean
#InboundChannelAdapter(value = Source.OUTPUT, poller = #Poller(fixedDelay = "1000", maxMessagesPerPoll = "1"))
public MessageSource<String> timerMessageSource() {
ApplicationLog log = new ApplicationLog(this, "timerMessageSource");
String message = new SimpleDateFormat(format).format(new Date());
log.info("SampleSource.timeMessageSource() message=["+message+"]");
return () -> new GenericMessage<>(new SimpleDateFormat(format).format(new Date()));
}
I'm confused about the "value = Source.OUTPUT". Does this mean my processor needs to be named differently?
Is the inclusion of #Poller causing me a problem somehow?
This is how I define the 2 processor streams (del-1 and del-2) in SCDF shell:
stream create del-1 --definition ":split > processor-that-does-everything-sleeps5 --spring.cloud.stream.bindings.applicationMetrics.destination=metrics > :merge"
stream create del-2 --definition ":split > processor-that-does-everything-sleeps5 --spring.cloud.stream.bindings.applicationMetrics.destination=metrics > :merge"
Do I need to do anything differently there?
All of this is running in Docker/K8s.
RabbitMQ is given by bitnami/rabbitmq:3.7.2-r1 and is configured with the following props:
RABBITMQ_USERNAME: user
RABBITMQ_PASSWORD <redacted>:
RABBITMQ_ERL_COOKIE <redacted>:
RABBITMQ_NODE_PORT_NUMBER: 5672
RABBITMQ_NODE_TYPE: stats
RABBITMQ_NODE_NAME: rabbit#localhost
RABBITMQ_CLUSTER_NODE_NAME:
RABBITMQ_DEFAULT_VHOST: /
RABBITMQ_MANAGER_PORT_NUMBER: 15672
RABBITMQ_DISK_FREE_LIMIT: "6GiB"
Are any other environment variables necessary?

How do i create a TCP receiver that only consumes messages using akka streams?

We are on: akka-stream-experimental_2.11 1.0.
Inspired by the example
We wrote a TCP receiver as follows:
def bind(address: String, port: Int, target: ActorRef)
(implicit system: ActorSystem, actorMaterializer: ActorMaterializer): Future[ServerBinding] = {
val sink = Sink.foreach[Tcp.IncomingConnection] { conn =>
val serverFlow = Flow[ByteString]
.via(Framing.delimiter(ByteString("\n"), maximumFrameLength = 256, allowTruncation = true))
.map(message => {
target ? new Message(message); ByteString.empty
})
conn handleWith serverFlow
}
val connections = Tcp().bind(address, port)
connections.to(sink).run()
}
However, our intention was to have the receiver not respond at all and only sink the message. (The TCP message publisher does not care about response ).
Is it even possible? to not respond at all since akka.stream.scaladsl.Tcp.IncomingConnection takes a flow of type: Flow[ByteString, ByteString, Unit]
If yes, some guidance will be much appreciated. Thanks in advance.
One attempt as follows passes my unit tests but not sure if its the best idea:
def bind(address: String, port: Int, target: ActorRef)
(implicit system: ActorSystem, actorMaterializer: ActorMaterializer): Future[ServerBinding] = {
val sink = Sink.foreach[Tcp.IncomingConnection] { conn =>
val targetSubscriber = ActorSubscriber[Message](system.actorOf(Props(new TargetSubscriber(target))))
val targetSink = Flow[ByteString]
.via(Framing.delimiter(ByteString("\n"), maximumFrameLength = 256, allowTruncation = true))
.map(Message(_))
.to(Sink(targetSubscriber))
conn.flow.to(targetSink).runWith(Source(Promise().future))
}
val connections = Tcp().bind(address, port)
connections.to(sink).run()
}
You are on the right track. To keep the possibility to close the connection at some point you may want to keep the promise and complete it later on. Once completed with an element this element published by the source. However, as you don't want any element to be published on the connection, you can use drop(1) to make sure the source will never emit any element.
Here's an updated version of your example (untested):
val promise = Promise[ByteString]()
// this source will complete when the promise is fulfilled
// or it will complete with an error if the promise is completed with an error
val completionSource = Source(promise.future).drop(1)
completionSource // only used to complete later
.via(conn.flow) // I reordered the flow for better readability (arguably)
.runWith(targetSink)
// to close the connection later complete the promise:
def closeConnection() = promise.success(ByteString.empty) // dummy element, will be dropped
// alternatively to fail the connection later, complete with an error
def failConnection() = promise.failure(new RuntimeException)