I wanted to setup a basic producer-consumer with Flink on Kafka but I am having trouble producing data to an existing consumer via Java.
CLI solution
I setup a Kafka broker using kafka_2.11-2.4.0 zip from https://kafka.apache.org/downloads with commands
bin/zookeeper-server-start.sh config/zookeeper.properties
and bin/kafka-server-start.sh config/server.properties
I create a topic called transactions1 using
bin/kafka-topics.sh --create --bootstrap-server localhost:9092 --replication-factor 1 --partitions 1 --topic transactions1
Now I can use a producer and consumer on the command line to see that the topic has been created and works.
To setup consumer I run
bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic transactions1 --from-beginning
Now if any producer sends data to the topic transactions1 I will see it in the consumer console.
I test that the consumer is working by running
bin/kafka-console-producer.sh --broker-list localhost:9092 --topic transactions1
and enter the following data lines in the producer in cli which also show up in consumer cli.
{"txnID":1,"amt":100.0,"account":"AC1"}
{"txnID":2,"amt":10.0,"account":"AC2"}
{"txnID":3,"amt":20.0,"account":"AC3"}
Now I want to replicate step 3 i.e producer and consumer in Java code which is the core problem of this question.
So I setup a gradle java8 project with build.gradle
...
dependencies {
testCompile group: 'junit', name: 'junit', version: '4.12'
compile group: 'org.apache.flink', name: 'flink-connector-kafka_2.11', version: '1.9.0'
compile group: 'org.apache.flink', name: 'flink-core', version: '1.9.0'
// https://mvnrepository.com/artifact/org.apache.flink/flink-streaming-java
compile group: 'org.apache.flink', name: 'flink-streaming-java_2.12', version: '1.9.2'
compile group: 'com.google.code.gson', name: 'gson', version: '2.8.5'
compile group: 'com.twitter', name: 'chill-thrift', version: '0.7.6'
compile group: 'org.apache.thrift', name: 'libthrift', version: '0.11.0'
compile group: 'com.twitter', name: 'chill-protobuf', version: '0.7.6'
compile group: 'org.apache.thrift', name: 'protobuf-java', version: '3.7.0'
}
...
I setup a Custom Class Transactions.class where you can suggest changes to the Serialization Logic using Kryo, Protobuf or TbaseSerializer by extending classes relevant to Flink.
import com.google.gson.Gson;
import org.apache.flink.api.common.functions.MapFunction;
public class Transaction {
public final int txnID;
public final float amt;
public final String account;
public Transaction(int txnID, float amt, String account) {
this.txnID = txnID;
this.amt = amt;
this.account = account;
}
public String toJSONString() {
Gson gson = new Gson();
return gson.toJson(this);
}
public static Transaction fromJSONString(String some) {
Gson gson = new Gson();
return gson.fromJson(some, Transaction.class);
}
public static MapFunction<String, String> mapTransactions() {
MapFunction<String, String> map = new MapFunction<String, String>() {
#Override
public String map(String value) throws Exception {
if (value != null || value.trim().length() > 0) {
try {
return fromJSONString(value).toJSONString();
} catch (Exception e) {
return "";
}
}
return "";
}
};
return map;
}
#Override
public String toString() {
return "Transaction{" +
"txnID=" + txnID +
", amt=" + amt +
", account='" + account + '\'' +
'}';
}
}
Now time to use Flink to Produce and Consume streams on topic transactions1.
public class SetupSpike {
public static void main(String[] args) throws Exception {
System.out.println("begin");
List<Transaction> txns = new ArrayList<Transaction>(){{
add(new Transaction(1, 100, "AC1"));
add(new Transaction(2, 10, "AC2"));
add(new Transaction(3, 20, "AC3"));
}};
// This list txns needs to be serialized in Flink as Transaction.class->String->ByteArray
//via producer and then to the topic in Kafka broker
//and deserialized as ByteArray->String->Transaction.class from the Consumer in Flink reading Kafka broker.
String topic = "transactions1";
Properties properties = new Properties();
properties.setProperty("bootstrap.servers", "localhost:9092");
properties.setProperty("zookeeper.connect", "localhost:2181");
properties.setProperty("group.id", topic);
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
//env.getConfig().addDefaultKryoSerializer(Transaction.class, TBaseSerializer.class);
// working Consumer logic below which needs edit if you change serialization
FlinkKafkaConsumer<String> myConsumer = new FlinkKafkaConsumer<String>(topic, new SimpleStringSchema(), properties);
myConsumer.setStartFromEarliest(); // start from the earliest record possible
DataStream<String> stream = env.addSource(myConsumer).map(Transaction::toJSONString);
//working Producer logic below which works if you are sinking a pre-existing DataStream
//but needs editing to work with Java List<Transaction> datatype.
System.out.println("sinking expanded stream");
MapFunction<String, String> etl = new MapFunction<String, String>() {
#Override
public String map(String value) throws Exception {
if (value != null || value.trim().length() > 0) {
try {
return fromJSONString(value).toJSONString();
} catch (Exception e) {
return "";
}
}
return "";
}
};
FlinkKafkaProducer<String> myProducer = new FlinkKafkaProducer<String>(topic,
new KafkaSerializationSchema<String>() {
#Override
public ProducerRecord<byte[], byte[]> serialize(String element, #Nullable Long timestamp) {
try {
System.out.println(element);
return new ProducerRecord<byte[], byte[]>(topic, stringToBytes(etl.map(element)));
} catch (Exception e) {
e.printStackTrace();
}
return null;
}
}, properties, Semantic.EXACTLY_ONCE);
// stream.timeWindowAll(Time.minutes(1));
stream.addSink(myProducer);
JobExecutionResult execute = env.execute();
}
}
As you can see I am not able to do this with the List txns provided. The above is working code I could gather from Flink documentation to redirect topic stream data and sending data manually via Cli producer. The problem is writing KafkaProducer code in java that sends data to the topic, which is further compounded with issues like
Adding Timestamps, Watermarks
KeyBy operations
GroupBy/WindowBy operations
Adding custom ETL logic before Sinking.
Serialization/Deserialization logic in Flink
Can someone who has worked with Flink please help me with how to Produce the txns List to transactions1 topic in Flink and then verify that it works with Consumer?
Also any help on the issues of adding timestamp or some processing before sinking will be of great help. You can find source code on https://github.com/devssh/kafkaFlinkSpike and the intent is generate Flink boilerplate to add details of "AC1" from an in-memory store and join it with the Transaction event coming in real time to send expanded output to user.
Several points, in no particular order:
It would be better not to mix Flink version 1.9.2 together with version 1.9.0 as you've done here:
compile group: 'org.apache.flink', name: 'flink-connector-kafka_2.11', version: '1.9.0'
compile group: 'org.apache.flink', name: 'flink-core', version: '1.9.0'
compile group: 'org.apache.flink', name: 'flink-streaming-java_2.12', version: '1.9.2'
For tutorials on how to work with timestamps, watermarks, keyBy, windows, etc., see the online training materials from Ververica.
To use List<Transaction> txns as an input stream, you can do this (docs):
DataStream<Transaction> transactions = env.fromCollection(txns);
For an example of how to handle serialization / deserialization when working with Flink and Kafka, see the Flink Operations Playground, in particular look at ClickEventDeserializationSchema and ClickEventStatisticsSerializationSchema, which are used in ClickEventCount.java and defined here. (Note: this playground has not yet been updated for Flink 1.10.)
Related
I noticed that Schedulers.enableMetrics() got deprecated but I don't know what I should I do to get all my schedulers metered in a typical use case (using Spring Boot application).
Javadoc suggests using timedScheduler but how should it be achieved for Spring Boot?
First off, here are my thoughts on why the Schedulers.enableMetrics() approach was deprecated:
The previous approach was flawed in several ways:
intrinsic dependency on the MeterRegistry#globalRegistry() without any way of using a different registry.
wrong level of abstraction and limited instrumentation:
it was not the schedulers themselves that were instrumented, but individual ExecutorService instances assumed to back the schedulers.
schedulers NOT backed by any ExecutorService couldn't be instrumented.
schedulers backed by MULTIPLE ExecutorService (eg. a pool of workers) would produce multiple levels of metrics difficult to aggregate.
instrumentation was all-or-nothing, potentially polluting metrics backend with metrics from global or irrelevant schedulers.
A deliberate constraint of the new approach is that each Scheduler must be explicitly wrapped, which ensures that the correct MeterRegistry is used and that metrics are recognizable and aggregated for that particular Scheduler (thanks to the mandatory metricsPrefix).
I'm not a Spring Boot expert, but if you really want to instrument all the schedulers including the global ones here is a naive approach that will aggregate data from all the schedulers of same category, demonstrated in a Spring Boot app:
#SpringBootApplication
public class DemoApplication {
public static void main(String[] args) {
SpringApplication.run(DemoApplication.class, args);
}
#Configuration
static class SchedulersConfiguration {
#Bean
#Order(1)
public Scheduler originalScheduler() {
// For comparison, we can capture a new original Scheduler (which won't be disposed by setFactory, unlike the global ones)
return Schedulers.newBoundedElastic(4, 100, "compare");
}
#Bean
public SimpleMeterRegistry registry() {
return new SimpleMeterRegistry();
}
#Bean
public Schedulers.Factory instrumentedSchedulers(SimpleMeterRegistry registry) {
// Let's create a Factory that does the same as the default Schedulers factory in Reactor-Core, but with instrumentation
return new Schedulers.Factory() {
#Override
public Scheduler newBoundedElastic(int threadCap, int queuedTaskCap, ThreadFactory threadFactory, int ttlSeconds) {
// The default implementation maps to the vanilla Schedulers so we can delegate to that
Scheduler original = Schedulers.Factory.super.newBoundedElastic(threadCap, queuedTaskCap, threadFactory, ttlSeconds);
// IMPORTANT NOTE: in this example _all_ the schedulers of the same type will share the same prefix/name
// this would especially be problematic if gauges were involved as they replace old gauges of the same name.
// Fortunately, for now, TimedScheduler only uses counters, timers and longTaskTimers.
String prefix = "my.instrumented.boundedElastic"; // TimedScheduler will add `.scheduler.xxx` to that prefix
return Micrometer.timedScheduler(original, registry, prefix);
}
#Override
public Scheduler newParallel(int parallelism, ThreadFactory threadFactory) {
Scheduler original = Schedulers.Factory.super.newParallel(parallelism, threadFactory);
String prefix = "my.instrumented.parallel"; // TimedScheduler will add `.scheduler.xxx` to that prefix
return Micrometer.timedScheduler(original, registry, prefix);
}
#Override
public Scheduler newSingle(ThreadFactory threadFactory) {
Scheduler original = Schedulers.Factory.super.newSingle(threadFactory);
String prefix = "my.instrumented.single"; // TimedScheduler will add `.scheduler.xxx` to that prefix
return Micrometer.timedScheduler(original, registry, prefix);
}
};
}
#PreDestroy
void resetFactories() {
System.err.println("Resetting Schedulers Factory to default");
// Later on if we want to disable instrumentation we can reset the Factory to defaults (closing all instrumented schedulers)
Schedulers.resetFactory();
}
}
#Service
public static class Demo implements ApplicationRunner {
final Scheduler forComparison;
final SimpleMeterRegistry registry;
final Schedulers.Factory factory;
Demo(Scheduler forComparison, SimpleMeterRegistry registry, Schedulers.Factory factory) {
this.forComparison = forComparison;
this.registry = registry;
this.factory = factory;
Schedulers.setFactory(factory);
}
public void generateMetrics() {
Schedulers.boundedElastic().schedule(() -> {});
Schedulers.newBoundedElastic(4, 100, "bounded1").schedule(() -> {});
Schedulers.newBoundedElastic(4, 100, "bounded2").schedule(() -> {});
Micrometer.timedScheduler(
forComparison,
registry,
"my.custom.instrumented.bounded"
).schedule(() -> {});
Schedulers.newBoundedElastic(4, 100, "bounded3").schedule(() -> {});
}
public String getCompletedSummary() {
return Search.in(registry)
.name(n -> n.endsWith(".scheduler.tasks.completed"))
.timers()
.stream()
.map(c -> c.getId().getName() + "=" + c.count())
.collect(Collectors.joining("\n"));
}
#Override
public void run(ApplicationArguments args) throws Exception {
generateMetrics();
System.err.println(getCompletedSummary());
}
}
}
Which prints:
my.instrumented.boundedElastic.scheduler.tasks.completed=4
my.custom.instrumented.bounded.scheduler.tasks.completed=1
Notice how the metrics for the four instrumentedFactory-produced Scheduler are aggregated together.
There's a bit of a hacky workaround for this: by default Schedulers uses ReactorThreadFactory, an internal private class which happens to be a Supplier<String>, supplying the "simplified name" (ie toString but without the configuration options) of the Scheduler.
One could use the following method to tentatively extract that name:
static String inferSimpleSchedulerName(ThreadFactory threadFactory, String defaultName) {
if (!(threadFactory instanceof Supplier)) {
return defaultName;
}
Object supplied = ((Supplier<?>) threadFactory).get();
if (!(supplied instanceof String)) {
return defaultName;
}
return (String) supplied;
}
Which can be applied to eg. the newParallel method in the factory:
String simplifiedName = inferSimpleSchedulerName(threadFactory, "para???");
String prefix = "my.instrumented." + simplifiedName; // TimedScheduler will add `.scheduler.xxx` to that prefix
This can then be demonstrated by submitting a few tasks to different parallel schedulers in the Demo#generateMetrics() part:
Schedulers.parallel().schedule(() -> {});
Schedulers.newParallel("paraOne").schedule(() -> {});
Schedulers.newParallel("paraTwo").schedule(() -> {});
And now it prints (blank lines for emphasis):
my.instrumented.paraOne.scheduler.tasks.completed=1
my.instrumented.paraTwo.scheduler.tasks.completed=1
my.instrumented.parallel.scheduler.tasks.completed=1
my.custom.instrumented.bounded.scheduler.tasks.completed=1
my.instrumented.boundedElastic.scheduler.tasks.completed=4
private final ReactiveStringRedisTemplate redisTem;
#Override
public GatewayFilter apply(List<Role> roles) {
return (exchange, chain) -> {
ReactiveValueOperations<String, String> ops = redisTem.opsForValue();
ops
.get("aa")
.map(t -> {
log.info("[Redis] get value: {}", t);
if(t != null) throw new GlobalException(ResCode.F_FOBID);
return t;
})
.subscribe();
return chain.filter(exchange);
};
}
hello! I want to throw GlobalException when get value from Redis.
(GlobalException is my custom global exception).
but this code does not return an exception and proceeds to the next filter.
and I get message from terminal.
[Redis] get value: bbbbbbbbbbbbbbbbbbbbbb
[Operators:324] - Operator called default onErrorDropped
Caused by: com.api.sendgateway.errors.globals.GlobalException: Forbidden
at com.api.sendgateway.filters.part.AuthTokenFilter.lambda$apply$0(AuthTokenFilter.java:59)
at reactor.core.publisher.FluxMap$MapSubscriber.onNext(FluxMap.java:106)
at reactor.core.publisher.MonoNext$NextSubscriber.onNext(MonoNext.java:82)
at reactor.core.publisher.FluxUsingWhen$UsingWhenSubscriber.onNext(FluxUsingWhen.java:345)
at reactor.core.publisher.FluxMap$MapSubscriber.onNext(FluxMap.java:122)
at reactor.core.publisher.FluxMap$MapSubscriber.onNext(FluxMap.java:122)
at reactor.core.publisher.FluxFilter$FilterSubscriber.onNext(FluxFilter.java:113)
at reactor.core.publisher.MonoNext$NextSubscriber.onNext(MonoNext.java:82)
at reactor.core.publisher.FluxOnErrorResume$ResumeSubscriber.onNext(FluxOnErrorResume.java:79)
at reactor.core.publisher.MonoFlatMapMany$FlatMapManyInner.onNext(MonoFlatMapMany.java:250)
at reactor.core.publisher.FluxDefaultIfEmpty$DefaultIfEmptySubscriber.onNext(FluxDefaultIfEmpty.java:101)
at reactor.core.publisher.FluxMap$MapSubscriber.onNext(FluxMap.java:122)
at reactor.core.publisher.MonoNext$NextSubscriber.onNext(MonoNext.java:82)
at reactor.core.publisher.MonoNext$NextSubscriber.onNext(MonoNext.java:82)
at io.lettuce.core.RedisPublisher$ImmediateSubscriber.onNext(RedisPublisher.java:886)
at io.lettuce.core.RedisPublisher$RedisSubscription.onNext(RedisPublisher.java:291)
at io.lettuce.core.RedisPublisher$SubscriptionCommand.doOnComplete(RedisPublisher.java:773)
at io.lettuce.core.protocol.CommandWrapper.complete(CommandWrapper.java:65)
at io.lettuce.core.protocol.CommandWrapper.complete(CommandWrapper.java:63)
at io.lettuce.core.protocol.CommandHandler.complete(CommandHandler.java:747)
at io.lettuce.core.protocol.CommandHandler.decode(CommandHandler.java:682)
at io.lettuce.core.protocol.CommandHandler.channelRead(CommandHandler.java:599)
The filter process that I want to make is
get key from user.
search data from Redis with key.
if get data from Redis. throw exception.(returns an error to the user without further moving to the next filter)
my depndencies is below :
dependencies {
// redis
implementation 'org.springframework.boot:spring-boot-starter-data-redis-reactive'
// actuator
implementation 'org.springframework.boot:spring-boot-starter-actuator'
// gateway
implementation 'org.springframework.cloud:spring-cloud-starter-gateway'
// lombok
compileOnly 'org.projectlombok:lombok'
annotationProcessor 'org.projectlombok:lombok'
// jwt
implementation group: 'io.jsonwebtoken', name: 'jjwt', version: '0.9.1'
implementation group: 'javax.xml.bind', name: 'jaxb-api', version: '2.3.1'
// test
testImplementation 'org.springframework.boot:spring-boot-starter-test'
}
I'm working on a project where we are polling files from a sftp server and streaming it out into a object on the rabbitmq queue. Now when the rabbitmq is down it still polls and deletes the file from the server and losses the file while sending it on queue when rabbitmq is down. I'm using ExpressionEvaluatingRequestHandlerAdvice to remove the file on successful transformation. My code looks like this:
#Bean
public SessionFactory<ChannelSftp.LsEntry> sftpSessionFactory() {
DefaultSftpSessionFactory factory = new DefaultSftpSessionFactory(true);
factory.setHost(sftpProperties.getSftpHost());
factory.setPort(sftpProperties.getSftpPort());
factory.setUser(sftpProperties.getSftpPathUser());
factory.setPassword(sftpProperties.getSftpPathPassword());
factory.setAllowUnknownKeys(true);
return new CachingSessionFactory<>(factory);
}
#Bean
public SftpRemoteFileTemplate sftpRemoteFileTemplate() {
return new SftpRemoteFileTemplate(sftpSessionFactory());
}
#Bean
#InboundChannelAdapter(channel = TransformerChannel.TRANSFORMER_OUTPUT, autoStartup = "false",
poller = #Poller(value = "customPoller"))
public MessageSource<InputStream> sftpMessageSource() {
SftpStreamingMessageSource messageSource = new SftpStreamingMessageSource(sftpRemoteFileTemplate,
null);
messageSource.setRemoteDirectory(sftpProperties.getSftpDirPath());
messageSource.setFilter(new SftpPersistentAcceptOnceFileListFilter(new SimpleMetadataStore(),
"streaming"));
messageSource.setFilter(new SftpSimplePatternFileListFilter("*.txt"));
return messageSource;
}
#Bean
#Transformer(inputChannel = TransformerChannel.TRANSFORMER_OUTPUT,
outputChannel = SFTPOutputChannel.SFTP_OUTPUT,
adviceChain = "deleteAdvice")
public org.springframework.integration.transformer.Transformer transformer() {
return new SFTPTransformerService("UTF-8");
}
#Bean
public ExpressionEvaluatingRequestHandlerAdvice deleteAdvice() {
ExpressionEvaluatingRequestHandlerAdvice advice = new ExpressionEvaluatingRequestHandlerAdvice();
advice.setOnSuccessExpressionString(
"#sftpRemoteFileTemplate.remove(headers['file_remoteDirectory'] + headers['file_remoteFile'])");
advice.setPropagateEvaluationFailures(false);
return advice;
}
I don't want the files to get removed/polled from the remote sftp server when the rabbitmq server is down. How can i achieve this ?
UPDATE
Apologies for not mentioning that I'm using spring cloud stream rabbit binder. And here is the transformer service:
public class SFTPTransformerService extends StreamTransformer {
public SFTPTransformerService(String charset) {
super(charset);
}
#Override
protected Object doTransform(Message<?> message) throws Exception {
String fileName = message.getHeaders().get("file_remoteFile", String.class);
Object fileContents = super.doTransform(message);
return new customFileDTO(fileName, (String) fileContents);
}
}
UPDATE-2
I added TransactionSynchronizationFactory on the customPoller as suggested. Now it doesn't poll file when rabbit server is down, but when the server is up, it keeps on polling the same file over and over again!! I cannot figure it out why? I guess i cannot use PollerSpec cause im on 4.3.2 version.
#Bean(name = "customPoller")
public PollerMetadata pollerMetadataDTX(StartStopTrigger startStopTrigger,
CustomTriggerAdvice customTriggerAdvice) {
PollerMetadata pollerMetadata = new PollerMetadata();
pollerMetadata.setAdviceChain(Collections.singletonList(customTriggerAdvice));
pollerMetadata.setTrigger(startStopTrigger);
pollerMetadata.setMaxMessagesPerPoll(Long.valueOf(sftpProperties.getMaxMessagePoll()));
ExpressionEvaluatingTransactionSynchronizationProcessor syncProcessor =
new ExpressionEvaluatingTransactionSynchronizationProcessor();
syncProcessor.setBeanFactory(applicationContext.getAutowireCapableBeanFactory());
syncProcessor.setBeforeCommitChannel(
applicationContext.getBean(TransformerChannel.TRANSFORMER_OUTPUT, MessageChannel.class));
syncProcessor
.setAfterCommitChannel(
applicationContext.getBean(SFTPOutputChannel.SFTP_OUTPUT, MessageChannel.class));
syncProcessor.setAfterCommitExpression(new SpelExpressionParser().parseExpression(
"#sftpRemoteFileTemplate.remove(headers['file_remoteDirectory'] + headers['file_remoteFile'])"));
DefaultTransactionSynchronizationFactory defaultTransactionSynchronizationFactory =
new DefaultTransactionSynchronizationFactory(syncProcessor);
pollerMetadata.setTransactionSynchronizationFactory(defaultTransactionSynchronizationFactory);
return pollerMetadata;
}
I don't know if you need this info but my CustomTriggerAdvice and StartStopTrigger looks like this :
#Component
public class CustomTriggerAdvice extends AbstractMessageSourceAdvice {
#Autowired private StartStopTrigger startStopTrigger;
#Override
public boolean beforeReceive(MessageSource<?> source) {
return true;
}
#Override
public Message<?> afterReceive(Message<?> result, MessageSource<?> source) {
if (result == null) {
if (startStopTrigger.getStart()) {
startStopTrigger.stop();
}
} else {
if (!startStopTrigger.getStart()) {
startStopTrigger.stop();
}
}
return result;
}
}
public class StartStopTrigger implements Trigger {
private PeriodicTrigger startTrigger;
private boolean start;
public StartStopTrigger(PeriodicTrigger startTrigger, boolean start) {
this.startTrigger = startTrigger;
this.start = start;
}
#Override
public Date nextExecutionTime(TriggerContext triggerContext) {
if (!start) {
return null;
}
start = true;
return startTrigger.nextExecutionTime(triggerContext);
}
public void stop() {
start = false;
}
public void start() {
start = true;
}
public boolean getStart() {
return this.start;
}
}
Well, would be great to see what your SFTPTransformerService and determine how it is possible to perform an onSuccessExpression when there should be an exception in case of down broker.
You also should not only throw an exception do not perform delete, but consider to add a RequestHandlerRetryAdvice to re-send the file to the RabbitMQ: https://docs.spring.io/spring-integration/docs/5.0.6.RELEASE/reference/html/messaging-endpoints-chapter.html#retry-advice
UPDATE
So, well, since Gary guessed that you use Spring Cloud Stream to send message to the Rabbit Binder after your internal process (very sad that you didn't share that information originally), you need to take a look to the Binder error handling on the matter: https://docs.spring.io/spring-cloud-stream/docs/Elmhurst.RELEASE/reference/htmlsingle/#_retry_with_the_rabbitmq_binder
And that is true that ExpressionEvaluatingRequestHandlerAdvice is applied only for the SFTPTransformerService and nothing more. The downstream error (in the Binder) is not included in this process already.
UPDATE 2
Yeah... I think Gary is right, and we don't have choice unless configure a TransactionSynchronizationFactory on the customPoller level instead of that ExpressionEvaluatingRequestHandlerAdvice: ExpressionEvaluatingRequestHandlerAdvice .
The DefaultTransactionSynchronizationFactory can be configured with the ExpressionEvaluatingTransactionSynchronizationProcessor, which has similar goal as the mentioned ExpressionEvaluatingRequestHandlerAdvice, but on the transaction level which will include your process starting with the SFTP Channel Adapter and ending on the Rabbit Binder level with the send to AMQP attempts.
See Reference Manual for more information: https://docs.spring.io/spring-integration/reference/html/transactions.html#transaction-synchronization.
The point with the ExpressionEvaluatingRequestHandlerAdvice (and any AbstractRequestHandlerAdvice) that they have a boundary only around handleRequestMessage() method, therefore only during the component they are declared.
I have been trying to get my ignite continuous query code to work without setting the peer class loading to enabled. However I find that the code does not work.I tried debugging and realised that the call to cache.query(qry) errors out with the message "Failed to marshal custom event" error. When I enable the peer class loading , the code works as expected. Could someone provide guidance on how I can make this work without peer class loading?
Following is the code snippet that calls the continuous query.
public void subscribeEvent(IgniteCache<String,String> cache,String inKeyStr,ServerWebSocket websocket ){
System.out.println("in thread "+Thread.currentThread().getId()+"-->"+"subscribe event");
//ArrayList<String> inKeys = new ArrayList<String>(Arrays.asList(inKeyStr.split(",")));
ContinuousQuery<String, String> qry = new ContinuousQuery<>();
/****
* Continuous Query Impl
*/
inKeys = ","+inKeyStr+",";
qry.setInitialQuery(new ScanQuery<String, String>((k, v) -> inKeys.contains(","+k+",")));
qry.setTimeInterval(1000);
qry.setPageSize(1);
// Callback that is called locally when update notifications are received.
// Factory<CacheEntryEventFilter<String, String>> rmtFilterFactory = new com.ccx.ignite.cqfilter.FilterFactory().init(inKeyStr);
qry.setLocalListener(new CacheEntryUpdatedListener<String, String>() {
#Override public void onUpdated(Iterable<CacheEntryEvent<? extends String, ? extends String>> evts) {
for (CacheEntryEvent<? extends String, ? extends String> e : evts)
{
System.out.println("websocket locallsnr data in thread "+Thread.currentThread().getId()+"-->"+"key=" + e.getKey() + ", val=" + e.getValue());
try{
websocket.writeTextMessage("key=" + e.getKey() + ", val=" + e.getValue());
}
catch (Exception e1){
System.out.println("exception local listener "+e1.getMessage());
qry.setLocalListener(null) ; }
}
}
} );
qry.setRemoteFilterFactory( new com.ccx.ignite.cqfilter.FilterFactory().init(inKeys));
try{
cur = cache.query(qry);
for (Cache.Entry<String, String> e : cur)
{
System.out.println("websocket initialqry data in thread "+Thread.currentThread().getId()+"-->"+"key=" + e.getKey() + ", val=" + e.getValue());
websocket.writeTextMessage("key=" + e.getKey() + ", val=" + e.getValue());
}
}
catch (Exception e){
System.out.println("exception cache.query "+e.getMessage());
}
}
Following is the remote filter class that I have made into a self contained jar and pushed into the libs folder of ignite, so that this can be picked up by the server nodes
public class FilterFactory
{
public Factory<CacheEntryEventFilter<String, String>> init(String inKeyStr ){
System.out.println("factory init called jun22 ");
return new Factory <CacheEntryEventFilter<String, String>>() {
private static final long serialVersionUID = 5906783589263492617L;
#Override public CacheEntryEventFilter<String, String> create() {
return new CacheEntryEventFilter<String, String>() {
#Override public boolean evaluate(CacheEntryEvent<? extends String, ? extends String> e) {
//List inKeys = new ArrayList<String>(Arrays.asList(inKeyStr.split(",")));
System.out.println("inside remote filter factory ");
String inKeys = ","+inKeyStr+",";
return inKeys.contains(","+e.getKey()+",");
}
};
}
};
}
}
Overall logic that I'm trying to implement is to have a websocket client subscribe to an event by specifying a cache name and key(s) of interest.
The subscribe event code is called which creates a continuous query and registers a local listener callback for any update event on the key(s) of interest.
The remote filter is expected to filter the update event based on the key(s) passed to it as a string and the local listener is invoked if the filter event succeeds. The local listener writes the updated key value to the web socket reference passed to the subscribe event code.
The version of ignite Im using is 1.8.0. However the behaviour is the same in 2.0 as well.
Any help is greatly appreciated!
Here is the log snippet containing the relevant error
factory init called jun22
exception cache.query class org.apache.ignite.spi.IgniteSpiException: Failed to marshal custom event: StartRoutineDiscoveryMessage [startReqData=StartRequestData [prjPred=org.apache.ignite.configuration.CacheConfiguration$IgniteAllNodesPredicate#269707de, clsName=null, depInfo=null, hnd=CacheContinuousQueryHandlerV2 [rmtFilterFactory=com.ccx.ignite.cqfilter.FilterFactory$1#5dc301ed, rmtFilterFactoryDep=null, types=0], bufSize=1, interval=1000, autoUnsubscribe=true], keepBinary=false, routineId=b40ada9f-552d-41eb-90b5-3384526eb7b9]
From FilterFactory you are returning an instance of an anonymous class which in turn refers to the enclosing FilterFactory which is not serializable.
Just replace the returned anonymous CacheEntryEventFilter based class with a corresponding nested static class.
You need to explicitly deploy you CQ classes (remote filters specifically) on all nodes in topology. Just create a JAR file with them and put into libs folder prior to starting nodes.
Hi I am new to Storm and Kafka.
I am using storm 1.0.1 and kafka 0.10.0
we have a kafkaspout that would receive java bean from kafka topic.
I have spent several hours digging to find the right approach for that.
Found few articles which are useful but none of the approaches worked for me so far.
Following is my codes:
StormTopology:
public class StormTopology {
public static void main(String[] args) throws Exception {
//Topo test /zkroot test
if (args.length == 4) {
System.out.println("started");
BrokerHosts hosts = new ZkHosts("localhost:2181");
SpoutConfig kafkaConf1 = new SpoutConfig(hosts, args[1], args[2],
args[3]);
kafkaConf1.zkRoot = args[2];
kafkaConf1.useStartOffsetTimeIfOffsetOutOfRange = true;
kafkaConf1.startOffsetTime = kafka.api.OffsetRequest.LatestTime();
kafkaConf1.scheme = new SchemeAsMultiScheme(new KryoScheme());
KafkaSpout kafkaSpout1 = new KafkaSpout(kafkaConf1);
System.out.println("started");
ShuffleBolt shuffleBolt = new ShuffleBolt(args[1]);
AnalysisBolt analysisBolt = new AnalysisBolt(args[1]);
TopologyBuilder topologyBuilder = new TopologyBuilder();
topologyBuilder.setSpout("kafkaspout", kafkaSpout1, 1);
//builder.setBolt("counterbolt2", countbolt2, 3).shuffleGrouping("kafkaspout");
//This is for field grouping in bolt we need two bolt for field grouping or it wont work
topologyBuilder.setBolt("shuffleBolt", shuffleBolt, 3).shuffleGrouping("kafkaspout");
topologyBuilder.setBolt("analysisBolt", analysisBolt, 5).fieldsGrouping("shuffleBolt", new Fields("trip"));
Config config = new Config();
config.registerSerialization(VehicleTrip.class, VehicleTripKyroSerializer.class);
config.setDebug(true);
config.setNumWorkers(1);
LocalCluster cluster = new LocalCluster();
cluster.submitTopology(args[0], config, topologyBuilder.createTopology());
// StormSubmitter.submitTopology(args[0], config,
// builder.createTopology());
} else {
System.out
.println("Insufficent Arguements - topologyName kafkaTopic ZKRoot ID");
}
}
}
I am serializing the data at kafka using kryo
KafkaProducer:
public class StreamKafkaProducer {
private static Producer producer;
private final Properties props = new Properties();
private static final StreamKafkaProducer KAFKA_PRODUCER = new StreamKafkaProducer();
private StreamKafkaProducer(){
props.put("bootstrap.servers", "localhost:9092");
props.put("acks", "all");
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer", "com.abc.serializer.MySerializer");
producer = new org.apache.kafka.clients.producer.KafkaProducer(props);
}
public static StreamKafkaProducer getStreamKafkaProducer(){
return KAFKA_PRODUCER;
}
public void produce(String topic, VehicleTrip vehicleTrip){
ProducerRecord<String,VehicleTrip> producerRecord = new ProducerRecord<>(topic,vehicleTrip);
producer.send(producerRecord);
//producer.close();
}
public static void closeProducer(){
producer.close();
}
}
Kyro Serializer:
public class DataKyroSerializer extends Serializer<Data> implements Serializable {
#Override
public void write(Kryo kryo, Output output, VehicleTrip vehicleTrip) {
output.writeLong(data.getStartedOn().getTime());
output.writeLong(data.getEndedOn().getTime());
}
#Override
public Data read(Kryo kryo, Input input, Class<VehicleTrip> aClass) {
Data data = new Data();
data.setStartedOn(new Date(input.readLong()));
data.setEndedOn(new Date(input.readLong()));
return data;
}
I need to get the data back to the Data bean.
As per few articles I need to provide with a custom scheme and make it part of topology but till now I have no luck
Code for Bolt and Scheme
Scheme:
public class KryoScheme implements Scheme {
private ThreadLocal<Kryo> kryos = new ThreadLocal<Kryo>() {
protected Kryo initialValue() {
Kryo kryo = new Kryo();
kryo.addDefaultSerializer(Data.class, new DataKyroSerializer());
return kryo;
};
};
#Override
public List<Object> deserialize(ByteBuffer ser) {
return Utils.tuple(kryos.get().readObject(new ByteBufferInput(ser.array()), Data.class));
}
#Override
public Fields getOutputFields( ) {
return new Fields( "data" );
}
}
and bolt:
public class AnalysisBolt implements IBasicBolt {
/**
*
*/
private static final long serialVersionUID = 1L;
private String topicname = null;
public AnalysisBolt(String topicname) {
this.topicname = topicname;
}
public void prepare(Map stormConf, TopologyContext topologyContext) {
System.out.println("prepare");
}
public void execute(Tuple input, BasicOutputCollector collector) {
System.out.println("execute");
Fields fields = input.getFields();
try {
JSONObject eventJson = (JSONObject) JSONSerializer.toJSON((String) input
.getValueByField(fields.get(1)));
String StartTime = (String) eventJson.get("startedOn");
String EndTime = (String) eventJson.get("endedOn");
String Oid = (String) eventJson.get("_id");
int V_id = (Integer) eventJson.get("vehicleId");
//call method getEventForVehicleWithinTime(Long vehicleId, Date startTime, Date endTime)
System.out.println("==========="+Oid+"| "+V_id+"| "+StartTime+"| "+EndTime);
} catch (Exception e) {
e.printStackTrace();
}
}
but if I submit the storm topology i am getting error:
java.lang.IllegalStateException: Spout 'kafkaspout' contains a
non-serializable field of type com.abc.topology.KryoScheme$1, which
was instantiated prior to topology creation.
com.minda.iconnect.topology.KryoScheme$1 should be instantiated within
the prepare method of 'kafkaspout at the earliest.
Appreciate help to debug the issue and guide to right path.
Thanks
Your ThreadLocal is not Serializable. The preferable solution would be to make your serializer both Serializable and threadsafe. If this is not possible, then I see 2 alternatives since there is no prepare method as you would get in a bolt.
Declare it as static, which is inherently transient.
Declare it transient and access it via a private get method. Then you can initialize the variable on first access.
Within the Storm lifecycle, the topology is instantiated and then serialized to byte format to be stored in ZooKeeper, prior to the topology being executed. Within this step, if a spout or bolt within the topology has an initialized unserializable property, serialization will fail.
If there is a need for a field that is unserializable, initialize it within the bolt or spout's prepare method, which is run after the topology is delivered to the worker.
Source: Best Practices for implementing Apache Storm