what is correct use of consumer groups in spring cloud stream dataflow and rabbitmq? - rabbitmq

A follow up to this:
one SCDF source, 2 processors but only 1 processes each item
The 2 processors (del-1 and del-2) in the picture are receiving the same data within milliseconds of each other. I'm trying to rig this so del-2 never receives the same thing as del-1 and vice versa. So obviously I've got something configured incorrectly but I'm not sure where.
My processor has the following application.properties
spring.application.name=${vcap.application.name:sample-processor}
info.app.name=#project.artifactId#
info.app.description=#project.description#
info.app.version=#project.version#
management.endpoints.web.exposure.include=health,info,bindings
spring.autoconfigure.exclude=org.springframework.boot.autoconfigure.security.servlet.SecurityAutoConfiguration
spring.cloud.stream.bindings.input.group=input
Is "spring.cloud.stream.bindings.input.group" specified correctly?
Here's the processor code:
#Transformer(inputChannel = Processor.INPUT, outputChannel = Processor.OUTPUT)
public Object transform(String inputStr) throws InterruptedException{
ApplicationLog log = new ApplicationLog(this, "timerMessageSource");
String message = " I AM [" + inputStr + "] AND I HAVE BEEN PROCESSED!!!!!!!";
log.info("SampleProcessor.transform() incoming inputStr="+inputStr);
return message;
}
Is the #Transformer annotation the proper way to link this bit of code with "spring.cloud.stream.bindings.input.group" from application.properties? Are there any other annotations necessary?
Here's my source:
private String format = "EEEEE dd MMMMM yyyy HH:mm:ss.SSSZ";
#Bean
#InboundChannelAdapter(value = Source.OUTPUT, poller = #Poller(fixedDelay = "1000", maxMessagesPerPoll = "1"))
public MessageSource<String> timerMessageSource() {
ApplicationLog log = new ApplicationLog(this, "timerMessageSource");
String message = new SimpleDateFormat(format).format(new Date());
log.info("SampleSource.timeMessageSource() message=["+message+"]");
return () -> new GenericMessage<>(new SimpleDateFormat(format).format(new Date()));
}
I'm confused about the "value = Source.OUTPUT". Does this mean my processor needs to be named differently?
Is the inclusion of #Poller causing me a problem somehow?
This is how I define the 2 processor streams (del-1 and del-2) in SCDF shell:
stream create del-1 --definition ":split > processor-that-does-everything-sleeps5 --spring.cloud.stream.bindings.applicationMetrics.destination=metrics > :merge"
stream create del-2 --definition ":split > processor-that-does-everything-sleeps5 --spring.cloud.stream.bindings.applicationMetrics.destination=metrics > :merge"
Do I need to do anything differently there?
All of this is running in Docker/K8s.
RabbitMQ is given by bitnami/rabbitmq:3.7.2-r1 and is configured with the following props:
RABBITMQ_USERNAME: user
RABBITMQ_PASSWORD <redacted>:
RABBITMQ_ERL_COOKIE <redacted>:
RABBITMQ_NODE_PORT_NUMBER: 5672
RABBITMQ_NODE_TYPE: stats
RABBITMQ_NODE_NAME: rabbit#localhost
RABBITMQ_CLUSTER_NODE_NAME:
RABBITMQ_DEFAULT_VHOST: /
RABBITMQ_MANAGER_PORT_NUMBER: 15672
RABBITMQ_DISK_FREE_LIMIT: "6GiB"
Are any other environment variables necessary?

Related

Akka - Unable to send Discriminated Unions as messages in F#

Akka - Discriminated Unions as messages in F#
I am unable to use discriminated unions as messages to akka actors. If anyone can point me at an example that does this, it would be much appreciated.
My own attempt at this is at git#github.com:Tweega/AkkaMessageIssue.git. (snippets below). It is a cutdown version of a sample found at https://github.com/rikace/AkkaActorModel.git (Chat project)
Problem
The DU message never finds its target on the server actor, but is sent to the deadletter box. If I send Objects, instead, they do arrive.
If I send a DU, but set my server actor to listen for generic Objects, the message does arrive, but its type is
seq [seq [seq []]
and I can't get at underlying DU.
The DU I am trying to send as message
type PrinterJob =
| PrintThis of string
| Teardown
The client code
let system = System.create "MyClient" config
let chatClientActor =
spawn system "ChatClient" <| fun mailbox ->
let server = mailbox.Context.ActorSelection("akka.tcp://MyServer#localhost:8081/user/ChatServer")
let rec loop nick = actor {
let! (msg:PrinterJob) = mailbox.Receive()
server.Tell(msg)
return! loop nick
}
loop ""
while true do
let input = Console.ReadLine()
chatClientActor.Tell(PrintThis(input))
Messages are forwarded to the client from console input
while true do
let input = Console.ReadLine()
chatClientActor.Tell(PrintThis(input))
The server code
let system = System.create "MyServer" config
let chatServerActor =
spawn system "ChatServer" <| fun (mailbox:Actor<_>) ->
let rec loop (clients:Akka.Actor.IActorRef list) = actor {
let! (msg:PrinterJob) = mailbox.Receive()
printfn "Received %A" msg //Received seq [seq [seq []]; seq [seq [seq []]]] ???
match msg with
| PrintThis str ->
Console.WriteLine("Printing: {0} Do we get this?", str)
return! loop clients
| Teardown ->
Console.WriteLine("Tearing down now")
return! loop clients
}
loop []
Dependencies
(I am not using paket here) - PM commands below:
Install-Package Akka -Version 1.4.23
Install-Package Akka.Remote -Version 1.4.23
Install-Package Akka.FSharp -Version 1.4.23
I am hosting the application in net5.0
Constructor argument names - oddity?
When passing in class instances as objects, akka seems to be sensitive to the name of constructor parameters. The message gets handled, but the data is not copied across from client to server. If you have a property called Username, the constructor parameter cannot be, for example, uName, otherwise its value is null when it reaches the server. Code for this is in branch params.
type DoesWork(montelimar: string) =
member x.Montelimar = montelimar
type DoesNotWork(montelimaro: string) =
member x.Montelimar = montelimaro
I opened an issue in the Akka.NET repository: https://github.com/akkadotnet/akka.net/issues/5194
And added a detailed reproduction for this: https://github.com/akkadotnet/akka.net/pull/5196
But it looks like Newtonsoft.Json really can't perform this deserialization without being given a type hint, which Akka.NET's network serialization does not do by default for JSON:
type TestUnion =
| A of string
| B of int * string
type TestUnion2 =
| C of string * TestUnion
| D of int
[<Fact(Skip="JSON.NET really does not support even basic DU serialization")>]
member _.``JSON.NET must serialize DUs`` () =
let du = C("a-11", B(11, "a-12"))
let settings = new JsonSerializerSettings()
settings.Converters.Add(new DiscriminatedUnionConverter())
let serialized = JsonConvert.SerializeObject(du, settings)
let deserialized = JsonConvert.DeserializeObject(serialized, settings)
Assert.Equal(du :> obj, deserialized)
That test will not pass and it doesn't use any of Akka.NET's infrastructure at all - so the default JSON serializer simply won't work for real-world F# use cases.
We can try changing the defaults of our serialization system to include a type hint, but that will take a lot of validation testing (for old Akka.Persistence data serialized without one).
A better solution, which my pull request validates, is to use Hyperion for polymorphic serialization instead - it will be similarly transparent to you but it has much more robust handling for complex types than Newtonsoft.Json and is actually faster: https://getakka.net/articles/networking/serialization.html#how-to-setup-hyperion-as-default-serializer

PLC4X:Exception during scraping of Job

I'm actually developing a project that read data from 19 PLCs Siemens S1500 and 1 modicon. I have used the scraper tool following this tutorial:
PLC4x scraper tutorial
but when the scraper is working for a little amount of time I get the following exception:
I have changed the scheduled time between 1 to 100 and I always get the same exception when the scraper reach the same number of received messages.
I have tested if using PlcDriverManager instead of PooledPlcDriverManager could be a solution but the same problem persists.
In my pom.xml I use the following dependency:
<dependency>
<groupId>org.apache.plc4x</groupId>
<artifactId>plc4j-scraper</artifactId>
<version>0.7.0</version>
</dependency>
I have tried to change the version to an older one like 0.6.0 or 0.5.0 but the problem still persists.
If I use the modicon (Modbus TCP) I also get this exception after a little amount of time.
Anyone knows why is happening this error? Thanks in advance.
Edit: With the scraper version 0.8.0-SNAPSHOT I continue having this problem.
Edit2: This is my code, I think the problem can be that in my scraper I am opening a lot of connections and when it reaches 65526 messages it fails. But since all the processing is happenning inside the lambda function and I'm using a PooledPlcDriverManager, I think the scraper is using only one connection so I dont know where is the mistake.
try {
// Create a new PooledPlcDriverManager
PlcDriverManager S7_plcDriverManager = new PooledPlcDriverManager();
// Trigger Collector
TriggerCollector S7_triggerCollector = new TriggerCollectorImpl(S7_plcDriverManager);
// Messages counter
AtomicInteger messagesCounter = new AtomicInteger();
// Configure the scraper, by binding a Scraper Configuration, a ResultHandler and a TriggerCollector together
TriggeredScraperImpl S7_scraper = new TriggeredScraperImpl(S7_scraperConfig, (jobName, sourceName, results) -> {
LinkedList<Object> S7_results = new LinkedList<>();
messagesCounter.getAndIncrement();
S7_results.add(jobName);
S7_results.add(sourceName);
S7_results.add(results);
logger.info("Array: " + String.valueOf(S7_results));
logger.info("MESSAGE number: " + messagesCounter);
// Producer topics routing
String topic = "s7" + S7_results.get(1).toString().substring(S7_results.get(1).toString().indexOf("S7_SourcePLC") + 9 , S7_results.get(1).toString().length());
String key = parseKey_S7("s7");
String value = parseValue_S7(S7_results.getLast().toString(),S7_results.get(1).toString());
logger.info("------- PARSED VALUE -------------------------------- " + value);
// Create my own Kafka Producer
ProducerRecord<String, String> record = new ProducerRecord<String, String>(topic, key, value);
// Send Data to Kafka - asynchronous
producer.send(record, new Callback() {
public void onCompletion(RecordMetadata recordMetadata, Exception e) {
// executes every time a record is successfully sent or an exception is thrown
if (e == null) {
// the record was successfully sent
logger.info("Received new metadata. \n" +
"Topic:" + recordMetadata.topic() + "\n" +
"Partition: " + recordMetadata.partition() + "\n" +
"Offset: " + recordMetadata.offset() + "\n" +
"Timestamp: " + recordMetadata.timestamp());
} else {
logger.error("Error while producing", e);
}
}
});
}, S7_triggerCollector);
S7_scraper.start();
S7_triggerCollector.start();
} catch (ScraperException e) {
logger.error("Error starting the scraper (S7_scrapper)", e);
}
So in the end indeed it was the PLC that was simply hanging up the connection randomly. However the NiFi integration should have handled this situation more gracefully. I implemented a fix for this particular error ... could you please give version 0.8.0-SNAPSHOT a try (or use 0.8.0 if we happen to have released it already)

NPE while deserializing avro messages in kafka streams

I wrote a small java class to test the consumption of Avro encoded Kafka topic.
Properties appProps = new Properties();
appProps.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "http://***kfk14bro1.lc:9092");
appProps.put(AbstractKafkaAvroSerDeConfig.SCHEMA_REGISTRY_URL_CONFIG, "http://***kfk14str1.lc:8081");
appProps.put(StreamsConfig.APPLICATION_ID_CONFIG, "consumer");
appProps.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "latest");
appProps.put(StreamsConfig.DEFAULT_DESERIALIZATION_EXCEPTION_HANDLER_CLASS_CONFIG,LogAndContinueExceptionHandler.class);
StreamsBuilder streamsBuilder = new StreamsBuilder();
streamsBuilder.stream(
"coordinates", Consumed.with(Serdes.String(), new GenericAvroSerde()))
.peek((key, value) -> System.out.println("key=" + key + ", value=" + value));
new KafkaStreams(streamsBuilder.build(), appProps).start();
When I run this class, SerdeConfigs are being logged alright which can be seen in the below log:
[consumer-56b0e0ca-d336-45cc-b388-46a68dbfab8b-StreamThread-1] INFO io.confluent.kafka.serializers.KafkaAvroSerializerConfig - KafkaAvroSerializerConfig values:
schema.registry.url = [http://***kfk14str1.lc:8081]
basic.auth.user.info = [hidden]
auto.register.schemas = true
max.schemas.per.subject = 1000
basic.auth.credentials.source = URL
schema.registry.basic.auth.user.info = [hidden]
value.subject.name.strategy = class io.confluent.kafka.serializers.subject.TopicNameStrategy
key.subject.name.strategy = class io.confluent.kafka.serializers.subject.TopicNameStrategy
[normal-consumer-56b0e0ca-d336-45cc-b388-46a68dbfab8b-StreamThread-1] INFO io.confluent.kafka.serializers.KafkaAvroDeserializerConfig - KafkaAvroDeserializerConfig values:
schema.registry.url = [http://***kfk14str1.lc:8081]
basic.auth.user.info = [hidden]
auto.register.schemas = true
max.schemas.per.subject = 1000
basic.auth.credentials.source = URL
schema.registry.basic.auth.user.info = [hidden]
specific.avro.reader = false
value.subject.name.strategy = class io.confluent.kafka.serializers.subject.TopicNameStrategy
key.subject.name.strategy = class io.confluent.kafka.serializers.subject.TopicNameStrategy
but messages are not being consumed and generates the below log for every message:
[normal-consumer-56b0e0ca-d336-45cc-b388-46a68dbfab8b-StreamThread-1] WARN org.apache.kafka.streams.errors.LogAndContinueExceptionHandler - Exception caught during Deserialization, taskId: 0_0, topic: coordinates, partition: 0, offset: 782205986
org.apache.kafka.common.errors.SerializationException: Error deserializing Avro message for id 83
Caused by: java.lang.NullPointerException
at io.confluent.kafka.serializers.AbstractKafkaAvroDeserializer.deserialize(AbstractKafkaAvroDeserializer.java:116)
at io.confluent.kafka.serializers.AbstractKafkaAvroDeserializer.deserialize(AbstractKafkaAvroDeserializer.java:88)
at io.confluent.kafka.serializers.KafkaAvroDeserializer.deserialize(KafkaAvroDeserializer.java:55)
at io.confluent.kafka.streams.serdes.avro.GenericAvroDeserializer.deserialize(GenericAvroDeserializer.java:63)
at io.confluent.kafka.streams.serdes.avro.GenericAvroDeserializer.deserialize(GenericAvroDeserializer.java:39)
at org.apache.kafka.common.serialization.Deserializer.deserialize(Deserializer.java:58)
at org.apache.kafka.streams.processor.internals.SourceNode.deserializeValue(SourceNode.java:60)
But I am able to read just fine from the avro console consumer, so I know there is nothing wrong with the data written to the topic. Below command prints logs alright:
~/kafka/confluent-5.1.2/bin/kafka-avro-console-consumer --bootstrap-server http://***kfk14bro1.lc:9092 --topic coordinates --property schema.registry.url=http://***kfk14str1.lc:8081 --property auto.offset.reset=latest
When you instantiate an Avro Serde yourself it is not configured automatically with the schema-registry URL.
So either you have to configure it yourself or you define default serdes by adding:
appProps.setProperty(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass().getName());
appProps.setProperty(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, GenericAvroSerde.class.getName());
And by removing
Consumed.with(Serdes.String(), new GenericAvroSerde())
To configure Serde use following code (adapt it to your situation):
GenericAvroSerde genericAvroSerde = new GenericAvroSerde();
boolean isKeySerde = false;
genericAvroSerde.configure(
Collections.singletonMap(
AbstractKafkaAvroSerDeConfig.SCHEMA_REGISTRY_URL_CONFIG,
"http://confluent-schema-registry-server:8081/"),
isKeySerde);

Apache-ignite: Persistent Storage

My understanding for Ignite Persistent Storage is that the data is not only saved in memory, but also written to disk.
When the node is restarted, it should read the data from disk to memory.
So, I am using this example to test it out. But I update it a little bit because I don't want to use xml.
This is my slightly updated code.
public class PersistentIgniteExpr {
/**
* Organizations cache name.
*/
private static final String ORG_CACHE = "CacheQueryExample_Organizations";
/** */
private static final boolean UPDATE = true;
public void test(String nodeId) {
// Apache Ignite node configuration.
IgniteConfiguration cfg = new IgniteConfiguration();
// Ignite persistence configuration.
DataStorageConfiguration storageCfg = new DataStorageConfiguration();
// Enabling the persistence.
storageCfg.getDefaultDataRegionConfiguration().setPersistenceEnabled(true);
// Applying settings.
cfg.setDataStorageConfiguration(storageCfg);
List<String> addresses = new ArrayList<>();
addresses.add("127.0.0.1:47500..47502");
TcpDiscoverySpi tcpDiscoverySpi = new TcpDiscoverySpi();
tcpDiscoverySpi.setIpFinder(new TcpDiscoveryMulticastIpFinder().setAddresses(addresses));
cfg.setDiscoverySpi(tcpDiscoverySpi);
try (Ignite ignite = Ignition.getOrStart(cfg.setIgniteInstanceName(nodeId))) {
// Activate the cluster. Required to do if the persistent store is enabled because you might need
// to wait while all the nodes, that store a subset of data on disk, join the cluster.
ignite.active(true);
CacheConfiguration<Long, Organization> cacheCfg = new CacheConfiguration<>(ORG_CACHE);
cacheCfg.setAtomicityMode(CacheAtomicityMode.TRANSACTIONAL);
cacheCfg.setBackups(1);
cacheCfg.setWriteSynchronizationMode(CacheWriteSynchronizationMode.FULL_SYNC);
cacheCfg.setIndexedTypes(Long.class, Organization.class);
IgniteCache<Long, Organization> cache = ignite.getOrCreateCache(cacheCfg);
if (UPDATE) {
System.out.println("Populating the cache...");
try (IgniteDataStreamer<Long, Organization> streamer = ignite.dataStreamer(ORG_CACHE)) {
streamer.allowOverwrite(true);
for (long i = 0; i < 100_000; i++) {
streamer.addData(i, new Organization(i, "organization-" + i));
if (i > 0 && i % 10_000 == 0)
System.out.println("Done: " + i);
}
}
}
// Run SQL without explicitly calling to loadCache().
QueryCursor<List<?>> cur = cache.query(
new SqlFieldsQuery("select id, name from Organization where name like ?")
.setArgs("organization-54321"));
System.out.println("SQL Result: " + cur.getAll());
// Run get() without explicitly calling to loadCache().
Organization org = cache.get(54321l);
System.out.println("GET Result: " + org);
}
}
}
When I run the first time, it works as intended.
After running it one time, I am assuming that data is written to disk since the code is about persistent storage.
When I run the second time, I commented out this part.
if (UPDATE) {
System.out.println("Populating the cache...");
try (IgniteDataStreamer<Long, Organization> streamer = ignite.dataStreamer(ORG_CACHE)) {
streamer.allowOverwrite(true);
for (long i = 0; i < 100_000; i++) {
streamer.addData(i, new Organization(i, "organization-" + i));
if (i > 0 && i % 10_000 == 0)
System.out.println("Done: " + i);
}
}
}
That is the part where data is written. When the sql query is executed, it is returning null. That means data is not written to disk?
Another question is I am not very clear about TcpDiscoverySpi. Can someone explain about it as well?
Thanks in advance.
Do you have any exceptions at node startup?
Very probably, you don't have IGNITE_HOME env variable configured. And the Work Directory for persistence is chosen somehow differently each time you run a node.
You can either setup IGNITE_HOME env variable or add a code line to setup workDirectory explicitly: cfg.setWorkDirectory("C:\\workDirectory");
TcpDiscoverySpi provides a way to discover remote nodes in a grid, so the starting node can join a cluster. It is better to use TcpDiscoveryVmIpFinder if you know the list of IPs. TcpDiscoveryMulticastIpFinder broadcasts UDP messages to a network to discover other nodes. It does not require IPs list at all.
Please see https://apacheignite.readme.io/docs/cluster-config for more details.

How do i create a TCP receiver that only consumes messages using akka streams?

We are on: akka-stream-experimental_2.11 1.0.
Inspired by the example
We wrote a TCP receiver as follows:
def bind(address: String, port: Int, target: ActorRef)
(implicit system: ActorSystem, actorMaterializer: ActorMaterializer): Future[ServerBinding] = {
val sink = Sink.foreach[Tcp.IncomingConnection] { conn =>
val serverFlow = Flow[ByteString]
.via(Framing.delimiter(ByteString("\n"), maximumFrameLength = 256, allowTruncation = true))
.map(message => {
target ? new Message(message); ByteString.empty
})
conn handleWith serverFlow
}
val connections = Tcp().bind(address, port)
connections.to(sink).run()
}
However, our intention was to have the receiver not respond at all and only sink the message. (The TCP message publisher does not care about response ).
Is it even possible? to not respond at all since akka.stream.scaladsl.Tcp.IncomingConnection takes a flow of type: Flow[ByteString, ByteString, Unit]
If yes, some guidance will be much appreciated. Thanks in advance.
One attempt as follows passes my unit tests but not sure if its the best idea:
def bind(address: String, port: Int, target: ActorRef)
(implicit system: ActorSystem, actorMaterializer: ActorMaterializer): Future[ServerBinding] = {
val sink = Sink.foreach[Tcp.IncomingConnection] { conn =>
val targetSubscriber = ActorSubscriber[Message](system.actorOf(Props(new TargetSubscriber(target))))
val targetSink = Flow[ByteString]
.via(Framing.delimiter(ByteString("\n"), maximumFrameLength = 256, allowTruncation = true))
.map(Message(_))
.to(Sink(targetSubscriber))
conn.flow.to(targetSink).runWith(Source(Promise().future))
}
val connections = Tcp().bind(address, port)
connections.to(sink).run()
}
You are on the right track. To keep the possibility to close the connection at some point you may want to keep the promise and complete it later on. Once completed with an element this element published by the source. However, as you don't want any element to be published on the connection, you can use drop(1) to make sure the source will never emit any element.
Here's an updated version of your example (untested):
val promise = Promise[ByteString]()
// this source will complete when the promise is fulfilled
// or it will complete with an error if the promise is completed with an error
val completionSource = Source(promise.future).drop(1)
completionSource // only used to complete later
.via(conn.flow) // I reordered the flow for better readability (arguably)
.runWith(targetSink)
// to close the connection later complete the promise:
def closeConnection() = promise.success(ByteString.empty) // dummy element, will be dropped
// alternatively to fail the connection later, complete with an error
def failConnection() = promise.failure(new RuntimeException)