KTable-KTable FK join: can't select foreign key serde - kotlin

summary
I'm trying to do a KTable-KTable foreign-key join, but I get an error because the Kafka Streams is
trying to use a String serde for the foreign key.
I want it to use a Kotlinx Serialization serde. How can I specify this?
details
I want to join the data of two KTables together, using a FK selector and remapping the values into
an aggregating object.
tilesGroupedByChunk
.join<ChunkTilesAndProtos, SurfaceIndex, SurfacePrototypesData>(
tilePrototypesTable, // join the prototypes KTable
{ cd: MapChunkData -> cd.chunkPosition.surfaceIndex }, // FK join on SurfaceIndex
{ chunkTiles: MapChunkData, protos: SurfacePrototypesData ->
ChunkTilesAndProtos(chunkTiles, protos) // remap value
},
namedAs("joining-chunks-tiles-prototypes"),
materializedAs(
"joined-chunked-tiles-with-prototypes",
// `.serde()`- helper function to make a Serde from a Kotlinx Serialization JSON module
// see https://github.com/adamko-dev/kotka-streams/blob/38388e74b16f3626a2733df1faea2037b89dee7c/modules/kotka-streams-kotlinx-serialization/src/main/kotlin/dev/adamko/kotka/kxs/jsonSerdes.kt#L48
jsonMapper.serde(),
jsonMapper.serde(),
),
)
However, I get an error, because Kafka Streams is using Serdes.String() (my default Serde)
for deserializing the foreign key. But it's a JSON object, I want it to use Kotlinx Serialization.
org.apache.kafka.streams.errors.StreamsException: ClassCastException invoking Processor.
Do the Processor's input types match the deserialized types?
Check the Serde setup and change the default Serdes in
StreamConfig or provide correct Serdes via method
parameters. Make sure the Processor can accept the
deserialized input of type key: myproject.MyTopology$MapChunkDataPosition, and value: org.apache.kafka.streams.kstream.internals.Change.
Note that although incorrect Serdes are a common cause
of error, the cast exception might have another cause
(in user code, for example). For example, if a
processor wires in a store, but casts the generics
incorrectly, a class cast exception could be raised
during processing, but the cause would not be wrong Serdes.
background
The data I'm working with is from a computer game. The game has a map, called a surface. Each
surface is uniquely identified by a surface index. Each surface has tiles, on an x/y plane. The
tiles have a 'prototype name', which is the ID of a TilePrototype. Each TilePrototype has
information about what that tile does, or looks like. I need it for the colour.
topology
group tiles by chunk
First I group the tiles into chunks of 32x32, and then group those into a KTable.
/** Each chunk is identified by the surface, and an x/y coordinate */
#Serializable
data class MapChunkDataPosition(
val position: MapChunkPosition,
val surfaceIndex: SurfaceIndex,
)
/** Each chunk has 32 tiles */
#Serializable
data class MapChunkData(
val chunkPosition: MapChunkDataPosition,
val tiles: Set<MapTile>,
)
// get all incoming tiles and group them by chunk,
// this works successfully
val tilesGroupedByChunk: KTable<MapChunkDataPosition, MapChunkData> =
buildChunkedTilesTable(tilesTable)
group prototypes by surface index
Then I collect all prototypes by surface index, and aggregate them into a list
/** Identifier for a surface (a simple wrapper, so I can use a Kotlinx Serialization serde everywhere)*/
#Serializable
data class SurfaceIndex(
val surfaceIndex: Int
)
/** Each surface has some 'prototypes' - I want this because each tile has a colour */
#Serializable
data class SurfacePrototypesData(
val surfaceIndex: SurfaceIndex,
val mapTilePrototypes: Set<MapTilePrototype>,
)
// get all incoming prototypes and group them by surface index,
// this works successfully
val tilePrototypesTable: KTable<SurfaceIndex, SurfacePrototypesData> =
tilePrototypesTable()
ktable-ktable fk join
This is the code that causes the error
/** For each chunk, get all tiles in that chunk, and all prototypes */
#Serializable
data class ChunkTilesAndProtos(
val chunkTiles: MapChunkData,
val protos: SurfacePrototypesData
)
tilesGroupedByChunk
.join<ChunkTilesAndProtos, SurfaceIndex, SurfacePrototypesData>(
tilePrototypesTable, // join the prototypes
{ cd: MapChunkData -> cd.chunkPosition.surfaceIndex }, // FK join on SurfaceIndex
{ chunkTiles: MapChunkData, protos: SurfacePrototypesData ->
ChunkTilesAndProtos(chunkTiles, protos) // remap value
},
namedAs("joining-chunks-tiles-prototypes"),
materializedAs(
"joined-chunked-tiles-with-prototypes",
// `.serde()`- helper function to make a Serde from a Kotlinx Serialization JSON module
// see https://github.com/adamko-dev/kotka-streams/blob/38388e74b16f3626a2733df1faea2037b89dee7c/modules/kotka-streams-kotlinx-serialization/src/main/kotlin/dev/adamko/kotka/kxs/jsonSerdes.kt#L48
jsonMapper.serde(),
jsonMapper.serde(),
),
)
full stack trace
org.apache.kafka.streams.errors.StreamsException: ClassCastException invoking Processor. Do the Processor's input types match the deserialized types? Check the Serde setup and change the default Serdes in StreamConfig or provide correct Serdes via method parameters. Make sure the Processor can accept the deserialized input of type key: MyProject.processor.Topology$MapChunkDataPosition, and value: org.apache.kafka.streams.kstream.internals.Change.
Note that although incorrect Serdes are a common cause of error, the cast exception might have another cause (in user code, for example). For example, if a processor wires in a store, but casts the generics incorrectly, a class cast exception could be raised during processing, but the cause would not be wrong Serdes.
at org.apache.kafka.streams.processor.internals.ProcessorNode.process(ProcessorNode.java:150)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forwardInternal(ProcessorContextImpl.java:253)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:232)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:191)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:172)
at org.apache.kafka.streams.kstream.internals.KTableMapValues$KTableMapValuesProcessor.process(KTableMapValues.java:131)
at org.apache.kafka.streams.kstream.internals.KTableMapValues$KTableMapValuesProcessor.process(KTableMapValues.java:105)
at org.apache.kafka.streams.processor.internals.ProcessorAdapter.process(ProcessorAdapter.java:71)
at org.apache.kafka.streams.processor.internals.ProcessorNode.process(ProcessorNode.java:146)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forwardInternal(ProcessorContextImpl.java:253)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:232)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:186)
at org.apache.kafka.streams.kstream.internals.TimestampedCacheFlushListener.apply(TimestampedCacheFlushListener.java:54)
at org.apache.kafka.streams.kstream.internals.TimestampedCacheFlushListener.apply(TimestampedCacheFlushListener.java:29)
at org.apache.kafka.streams.state.internals.MeteredKeyValueStore$1.apply(MeteredKeyValueStore.java:182)
at org.apache.kafka.streams.state.internals.MeteredKeyValueStore$1.apply(MeteredKeyValueStore.java:179)
at org.apache.kafka.streams.state.internals.CachingKeyValueStore.putAndMaybeForward(CachingKeyValueStore.java:107)
at org.apache.kafka.streams.state.internals.CachingKeyValueStore.lambda$initInternal$0(CachingKeyValueStore.java:87)
at org.apache.kafka.streams.state.internals.NamedCache.flush(NamedCache.java:151)
at org.apache.kafka.streams.state.internals.NamedCache.flush(NamedCache.java:109)
at org.apache.kafka.streams.state.internals.ThreadCache.flush(ThreadCache.java:136)
at org.apache.kafka.streams.state.internals.CachingKeyValueStore.flushCache(CachingKeyValueStore.java:345)
at org.apache.kafka.streams.state.internals.WrappedStateStore.flushCache(WrappedStateStore.java:71)
at org.apache.kafka.streams.processor.internals.ProcessorStateManager.flushCache(ProcessorStateManager.java:487)
at org.apache.kafka.streams.processor.internals.StreamTask.prepareCommit(StreamTask.java:402)
at org.apache.kafka.streams.processor.internals.TaskManager.commitAndFillInConsumedOffsetsAndMetadataPerTaskMap(TaskManager.java:1043)
at org.apache.kafka.streams.processor.internals.TaskManager.commit(TaskManager.java:1016)
at org.apache.kafka.streams.processor.internals.StreamThread.maybeCommit(StreamThread.java:1017)
at org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:786)
at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:583)
at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:555)
Caused by: java.lang.ClassCastException: class MyProjectTopology$MapChunkData cannot be cast to class java.lang.String (MyProject.processor.MyProject$MapChunkData is in unnamed module of loader 'app'; java.lang.String is in module java.base of loader 'bootstrap')
at org.apache.kafka.common.serialization.StringSerializer.serialize(StringSerializer.java:29)
at org.apache.kafka.streams.kstream.internals.foreignkeyjoin.ForeignJoinSubscriptionSendProcessorSupplier$UnbindChangeProcessor.process(ForeignJoinSubscriptionSendProcessorSupplier.java:99)
at org.apache.kafka.streams.kstream.internals.foreignkeyjoin.ForeignJoinSubscriptionSendProcessorSupplier$UnbindChangeProcessor.process(ForeignJoinSubscriptionSendProcessorSupplier.java:69)
at org.apache.kafka.streams.processor.internals.ProcessorAdapter.process(ProcessorAdapter.java:71)
at org.apache.kafka.streams.processor.internals.ProcessorNode.process(ProcessorNode.java:146)
... 30 common frames omitted
versions
Kotlin 1.6.10
Kafka Streams 3.0.0
Kotlinx Serialization 1.3.2

Somewhat expectedly, I had made a mistake further up in the topology definition.
In the final stage of creating one of the tables, I mapped the values - but I did not specify the serdes.
.mapValues { _, v ->
ChunkTilesAndProtos(v.tiles, v.protos)
}
So I changed it to specify the serdes.
.mapValues(
"finalise-web-map-tile-chunk-aggregation",
materializedAs("web-map-tile-chunks", jsonMapper.serde(), jsonMapper.serde())
) { _, v ->
ChunkTilesAndProtos(v.tiles, v.protos)
}
// note: this uses extension functions from github.com/adamko-dev/kotka-streams
It was not easy to find this. I found it by putting a breakpoint in the constructor of AbstractStream.java (among other constructors) to see when the keySerde and valueSerde fields were not being set.
Sometimes a null serde is expected (I think some KTables/KStreams are 'virtual' and do not encode/decode to/from Kafka topics). However I was able to find the operation that caused my problem, and define serdes because I was changing the value type.

Related

Jooq: JsonConverter not converting jsonb into list of class when fetching data

This is a continuation of a first question I asked here: Jooq: How can I map a JSONB column to a Kotlin data class field?
Although I'm able to create new records just fine with the changes mentioned there, I'm not being able to fetch data like so:
fun findAllTrackedEvents(): List<TrackedEvent> {
return dslContext.select(*TRACKED_EVENT.fields())
.from(TRACKED_EVENT)
.fetchInto(TrackedEvent::class.java)
}
It seems that jackson is mapping the rows into LinkedHashMaps instead of mapping them into the fields of the Metadata data class. This is the error I'm getting:
Resolved [org.springframework.http.converter.HttpMessageNotWritableException: Could not write JSON:
object is not an instance of declaring class; nested exception is com.fasterxml.jackson.databind.JsonMappingException:
object is not an instance of declaring class (through reference chain: java.util.ArrayList[0]->com.my.project.tracked_event.TrackedEvent["metadata"]->java.util.ArrayList[0]->java.util.LinkedHashMap["tableRef"])]
data class TrackedEvent(
val id: UUID,
// other fields
val metadata: List<Metadata> // this metadata field in the database is of type jsonb
)
data class Metadata(
val tableRef: String,
val value: UUID
)
So it can convert the field properly when inserting but not when fetching?
In my previous answer, I suggested you use arrays instead of lists. This had a reason. Consider this:
fun main() {
val a: Array<Int?> = arrayOf(1)
println(a::class.java)
val b: List<Int?> = listOf(1)
println(b::class.java)
}
It prints:
class [Ljava.lang.Integer;
class java.util.Collections$SingletonList
As you can see, while arrays are reified on the JVM, other generic types are not, and the T type variable of List<T> is erased. It is possible that Jackson cannot figure out the correct type to unmarshal at runtime using reflection, despite all the type information being available at compile time.
I would just use Array<Metadata> instead. Alternatively, of course, you can attach a custom converter to the column, instead of using the out of the box <jsonConverter>. That way, you're in full control of the mapping.

How can i configure jOOQ code generator to generate Set<T> instead of Array<T>?

I am working with a database where values are stored in an ARRAY column which has the semantics of a Java Set, most importantly that ordering does not matter.
Currently, the jOOQ generator generates POJOs and Records with Array<T> type for those columns. This is problematic, because two arrays don't equal if their ordering is different.
I tried creating a custom converter, however, obviously defining the toType as Set<String>::class.java won't compile because of type erasure(?)
class ArrayToSetConverter() :
AbstractConverter<Array<String>, Set<String>>(Array<String>::class.java, Set<String>::class.java) { ... }
Compilation error:
Only classes are allowed on the left hand side of a class literal
Is there another way of achieving my goal?
Similar (unfortunately unanswered) question: jOOQ converter from String to List<MyType> in Kotlin
With type erasure, I don't think you can formally create a Class<Set<String>> type reference in neither Java nor kotlin. But you don't have to. Just do this:
class ArrayToSetConverter() : AbstractConverter<Array<String>, Set<String>>(
Array<String>::class.java,
Set::class.java as Class<Set<String>> // Unsafe cast here
) { ... }

How can I encode a typed class with Kotlinx Serialization?

I'd like to encode a given class of type T: EventData with Kotlinx Serialization encodeToString.
This is my code:
class EventDispatcher<T: EventData>(
val pubSubTemplate: PubSubTemplate
) {
/**
* Dispatch an event to the game engine event manager pipeline
*/
fun dispatchEvent(event: T, initiator: String) {
val eventData: String = Json.encodeToString(event)
}
The compiler tells me:
Cannot use `T` as reified type parameter. Use a class instead
Is there a way to make this still work?
For Json.encodeToString(event) to work, it needs the type information for T. But, this type information is lost at runtime due to the way how generics work in Kotlin/Java.
One way to retain the type information would be by making dispatchEvent an inline function with T as a reified type parameter.
However, this also raises the question how you want to serialize event. You could also use polymorphic serialization of EventData, rather than trying to serialize T. This will include an additional class discriminator in your serialized output (it necessarily has to for polymorphic serialization/deserialization to work).
If you serialize the concrete type T, this class discriminator wouldn't be included, which is questionable; how would whoever will deserialize this know what type it is?
In short, I think you need polymorphic serialization.

Map of generic interfaces in Kotlin

I stuck with some simple thing) Let's say I have following:
interface IMessagePayload // marker interface
data class IdPayload(
val id: Long
) : IMessagePayload
data class StringPayload(
val id: String,
) : IMessagePayload
Then I have a class:
data class Message<T : IMessagePayload>(
val id: String,
val payload: T,
)
Also I have some interface describing processor of this message:
interface IMessageProcessor<T : IMessagePayload> {
fun process(message: Message<T>)
}
And some implementation:
class ProcessorImpl : IMessageProcessor<IdPayload> {
override fun process(message: Message<IdPayload>) {
}
}
Now I wanna have a map of such processors. Lets use some enum type as a keys of this map:
enum class ActionType {
UPDATE,
DELETE,
ADD
}
private var map = mutableMapOf<ActionType, IMessageProcessor<IMessagePayload>>()
map[ActionType.ADD] = ProcessorImpl() // <-- error here
And that's where the problem occurs. I cannot put my ProcessorImpl into this map. The compiler says that there is an error: Type mismatch. Required: IMessageProcessor. Found: ProcessorImpl().
I could declare the map in the following way (using star projection):
private var map = mutableMapOf<ActionType, IMessageProcessor<*>>()
But in this case I cannot call processors's process method fetching it from the map by key first:
map[ActionType.ADD]?.process(Message("message-id", IdPayload(1))) // <-- error here
Compiler complains: Type mismatch. Required nothing. Found Message<IdPayload>
What am I doing wrong? Any help is appreciated.
This is about variance.
IMessageProcessor is defined as interface IMessageProcessor<T : IMessagePayload>; it has one type parameter, which must be IMessagePayload or a subtype.
But it is invariant in that type parameter; an IMessageProcessor< IdPayload> is not related to an IMessageProcessor<IMessagePayload>.  In particular, it's not a subtype.
And your map is defined with a value type IMessageProcessor<IMessagePayload>.  So its value cannot be an IMessageProcessor< IdPayload>, because that's neither the value type, nor a subtype.  Hence the compile error.
In this case, the simplest way to get it to compile is to change your map:
private var map = mutableMapOf<ActionType, IMessageProcessor<out IMessagePayload>>()
The only difference there is the out; that tells the compiler that the value IMessageProcessor is covariant in its type parameter.  (It may help to think of out as meaning ‘…or any subtype’.  Similarly, you could make it contravariant by using in, which you might think of as ‘…or any supertype’.)
This lets you store in the map an IMessageProcessor for any subtype of IMessagePayload.
However, if you do that, you'll find that you can't use any value you pull out of your map — because it can't tell which messages the processor can handle, i.e. which subtype of IMessagePayload it works for!  (The compiler expresses this as expecting a type parameter of Nothing.)
In general, it's often better to specify variance on the interface or superclass itself (declaration-site variance) rather than the use-site variance shown above.  But I can't see a good way to do that here, because you have multiple generic classes, and they interact in a complicated way…)
Think for a moment what IMessageProcessor's type parameter means: it's the type of message that the processor can consume. So an IMessageProcessor<A> can handle messages of type Message<A>.
Now, a subtype must be able to do everything its supertype can do (and usually more) — otherwise you can't drop that subtype anywhere that's expecting to use the supertype.  (That has the grand name of the Liskov substitution principle — but it's really just common sense.)
So an IMessageProcessor<B> is a subtype of IMessageProcessor<A> only if it can handle at least all the messages that an IMessageProcessor<A> can.  This means it must accept all messages of type Message<A>.
But Message is invariant in its type parameter: a Message<B> is not directly related to a Message<A>.  So you can't write a processor that handles them both.
The most natural solution I can find is to specify variance on both Message and IMessageProcessor:
data class Message<out T : IMessagePayload>( /*…*/ )
interface IMessageProcessor<in T : IMessagePayload> { /*…*/ }
And then use a wildcard in your map to make it explicit that you don't know anything about the type parameters of its values:
private var map = mutableMapOf<ActionType, IMessageProcessor<*>>()
That lets you safely store a ProcessorImpl() in the map.
But you still have to use an (unchecked) cast on the values you pull out of the map before you can use them:
(map[ActionType.ADD] as IMessageProcessor<IdPayload>)
.process(Message("4", IdPayload(4L)))
I don't think there's any easy way around that, because the problem is inherent in having values which are processors that can handle only some (unknown) types of message.
I'm afraid the best thing would be to have a rethink about what these classes mean and how they should interact, and redesign accordingly.

Queries on schema and JSON data conversion

We already have flatbuffer library embedded in our software code for simple schemas with JSON output data generation.
More update: We are generating the header files using flatc compiler against the schema and integrate these files inside of our code along with FB library for further serialization/deserialization.
Now we also need to have the following schema tree to be supported.
namespace SampleNS;
/// user defined key value pairs to add custom metadata
/// key namespacing is the responsibility of the user
table KeyValue {
key:string (key, required);
value:string (required);
}
enum SchemaVersion:byte {
V1,
V2
}
table Sometable {
value1:ubyte;
value2:ushort (key);
}
table ComponentData {
inputs: [Sometable];
outputs: [Sometable];
}
table Node {
name:string (key);
/// IO definition
data:ComponentData;
/// nested child
child:[Components];
}
table Components {
type:ubyte;
index:ubyte;
nodes:[Node];
}
table GroupMasterData {
schemaversion:SchemaVersion = sampleNS::SchemaVersion::V1;
metainfo:[KeyValue];
/// List of expected components in the system
components:[Components];
}
root_type GroupMasterData;
As from above, table Components is nested recursively. The intention is components may have childs that have the same fields.
I have few queries:
Flatc didnt gave me any error during schema compilation for such
recursive nested tables. But is this supported during the field
access for such tables?
I tried to generate a sample json data file based on above data but I
could not see the field for schemaversion. I learned FB doesn't
serialize the default values. so, I removed the default value that I
assigned in the schema. But, it still doesnt write into the json data
file. On this I also learned we can forcefully write into the file
using force_defaults option. I don't know where is this is to be
put: in the attribute or elsewhere?
Can I create a struct of enum field?
Is their any API to set Flatbuffer options that we otherwise pass to the compiler arguments? or if not, may be I think we have to tinker with the FB library code. Please suggest.
Method 1:
In our serialization method, we do this:
flatbuffers::Parser* parser = new flatbuffers::Parser();
parser->opts.output_default_scalars_in_json = true;
Is this the right method or should I use any other API?
Yes, trees (and even DAG) structures are fully supported. The type definition is recursive, but the data will eventually have leaf nodes with an empty vector of children, presumably.
The integer value for V1 is 0, and that is also the default value for all fields with no explicit default assigned. Use --defaults-json to see this field when converting. Note that explicit versions in a schema is an anti-pattern, since schemas are naturally evolvable without breaking backwards compatibility.
You can put enum fields in structs, yes. Is that what you mean?