Flink statefun and confluent schema registry compatibility - confluent-schema-registry

I'm trying to egress to confluent kafka from flink statefun. In confluent git repo
in order to schema check and put data to kafka topic all we need to do is use kafka client ProducerRecord object with avro object.
But in statefun we need to override "ProducerRecord<byte[], byte[]> serialize" method for kafka egress. This causes the following error.
Caused by: org.apache.kafka.common.errors.SerializationException: Error registering Avro schema: "bytes"
Schema registery and statefun kafka egress seem to be incompatible. Are there any workaround ?

It is possible to use Confluent Schema Registry with Statefun Egress.
In order to do so, you first register your schema manually with the schema registry and then supply KafkaEgressSerializer a byte[] serialized by KafkaAvroSerializer instance.
Code below is the gist of it and is in compliance with the first one in Igal's workaround suggestions:
public class SpecificRecordFromAvroSchemaSerializer implements KafkaEgressSerializer<SpecificRecordGeneratedFromAvroSchema> {
private static String KAFKA_TOPIC = "kafka_topic";
private static CachedSchemaRegistryClient schemaRegistryClient = new CachedSchemaRegistryClient(
"http://schema-registry:8081",
1_000
);
private static KafkaAvroSerializer kafkaAvroSerializer = new KafkaAvroSerializer(schemaRegistryClient);
static {
try {
schemaRegistryClient.register(
KAFKA_TOPIC + "-value", // assuming subject name strategy is TopicNameStrategy (default)
SpecificRecordGeneratedFromAvroSchema.getClassSchema()
);
} catch (IOException e) {
e.printStackTrace();
} catch (RestClientException e) {
e.printStackTrace();
}
}
#Override
public ProducerRecord<byte[], byte[]> serialize(SpecificRecordGeneratedFromAvroSchema specificRecordGeneratedFromAvroSchema) {
byte[] valueData = kafkaAvroSerializer.serialize(
KAFKA_TOPIC,
specificRecordGeneratedFromAvroSchema
);
return new ProducerRecord<>(
KAFKA_TOPIC,
String.valueOf(System.currentTimeMillis()).getBytes(),
valueData
);
}
}

Schema registry is not directly supported at this version of stateful functions,
but few workarounds are possible:
Connect to the schema registry by your self from the KafkaEgressSerializer class. In your linked example that would need to be happening here.
Provide your own instance of a FlinkKafkaProducer that is based on (see
AvroDeserializationSchema)
Mange the schemas outside of stateful functions, but serialize your Avro record to bytes. Make sure to remove the schema registry from the properties that being passed to the KafkaProducer

Related

The way to use Redis in Apache Flink

I am using the Flink and want to insert the result value into the Redis.
When I googled the Redis, I found the redis-connector included in the Apache bahir.
So I am able to insert the result value into the redis using the reids-connector in the Apache bahir.
However, I think that I am also able to connect to the Redis using the Jedis.
I had the experiment showing that I was able to connect the redis and found the value inserted into the redis using jedis as shown in the code below.
DataStream<String> messageStream = env.addSource(new FlinkKafkaConsumer<>(flinkParams.getRequired("topic"), new SimpleStringSchema(), flinkParams.getProperties())).setParallelism(Math.min(hosts * cores, kafkaPartitions));
messageStream.keyBy(new KeySelector<String, String>() {
#Override
public String getKey(String s) throws Exception {
return s;
}
}).flatMap(new RedisConnector());
In the RedisConnector module, without the redis-connector in the Apache bahir, I also successfully connected to the redis and found the message processed after the Flink.
The example code is shown as below
public class ProcessorCommon {
private static final Logger logger = LoggerFactory.getLogger(ProcessorCommon.class);
private Jedis jedis;
private Set<DummyPair> dummy;
public ProcessorCommon(String redisServerHostName) {
this.jedis = new Jedis(redisServerHostName);
}
public void writeToRedis(String key, String value) {
this.jedis.set(key, value);
}
public String getFromRedis(String key) {
return this.jedis.get(key);
}
public void close() {
this.jedis.close();
}
}
So I am wondering that there is a difference between using redis-connector in the bahir and Jedis.
There is currently no real Redis connector maintained by the Flink community. The Redis connector in Bahir is rather outdated. There is a new Redis Streams connector in the works, which can be found at https://github.com/apache/flink-connector-redis-streams

No physical database known of type cassandra -AWS keyspace

We are connecting our microservices to aws keyspace(Cassandra) through dbaas.
Getting error
cloud.dbaas.client.exceptions.CreateDbException: MicroserviceRestClientResponseException{message=404 Not Found: "No physical database known of type cassandra
Even getting same error from dbaas pods logs.
I already configured below parameters
spring.data.cassandra.ssl
spring.data.cassandra.contact-points
spring.data.cassandra.local-datacenter
spring.data.cassandra.port
spring.data.cassandra.password
spring.data.cassandra.username
You will want to reference the external configuration. See the following Amazon Keyspaces spring example.
https://github.com/aws-samples/amazon-keyspaces-spring-app-example/
#Configuration
public class AppConfig {
private final String username = System.getenv("AWS_MCS_SPRING_APP_USERNAME");
private final String password = System.getenv("AWS_MCS_SPRING_APP_PASSWORD");
File driverConfig = new File(System.getProperty("user.dir")+"/application.conf");
#Primary
public #Bean
CqlSession session() throws NoSuchAlgorithmException {
return CqlSession.builder().
withConfigLoader(DriverConfigLoader.fromFile(driverConfig)).
withAuthCredentials(username, password).
withSslContext(SSLContext.getDefault()).
withKeyspace("keyspace_name").
build();
}
}

Register Hibernate 5 Event Listeners

I am working on a legacy non-Spring application, and it is being migrated from Hibernate 3 to Hibernate 5.6.0.Final (latest at this time). I have generally never used Hibernate Event Listeners in my work, so this is quite new to me, and I am studying these in Hibernate 5.
Currently in some test class we have defined the code this way for Hibernate 3:
protected static Configuration createSecuredDatabaseConfig() {
Configuration config = createUnrestrictedDatabaseConfig();
config.setListener("pre-insert", "com.app.server.services.db.eventlisteners.MySecurityHibernateEventListener");
config.setListener("pre-update", "com.app.server.services.db.eventlisteners.MySecurityHibernateEventListener");
config.setListener("pre-delete", "com.app.server.services.db.eventlisteners.MySecurityHibernateEventListener");
config.setListener("pre-load", "com.app.server.services.db.eventlisteners.EkoSecurityHibernateEventListener");
return config;
}
This is obviously no longer valid, and I believe I need to create a Hibernate Integrator, which I have done.
public class MyEventListenerIntegrator implements Integrator {
#Override
public void integrate(Metadata metadata, SessionFactoryImplementor sessionFactory,
SessionFactoryServiceRegistry serviceRegistry) {
EventListenerRegistry eventListenerRegistry = serviceRegistry.getService(EventListenerRegistry.class);
eventListenerRegistry.getEventListenerGroup(EventType.PRE_INSERT).appendListener(new MySecurityHibernateEventListener());
eventListenerRegistry.getEventListenerGroup(EventType.PRE_UPDATE).appendListener(new MySecurityHibernateEventListener());
eventListenerRegistry.getEventListenerGroup(EventType.PRE_DELETE).appendListener(new MySecurityHibernateEventListener());
eventListenerRegistry.getEventListenerGroup(EventType.PRE_LOAD).appendListener(new MySecurityHibernateEventListener());
}
So, now I believe the next step is to add this to the session via the registry builder. I am using this website to help me:
https://www.boraji.com/hibernate-5-event-listener-example
Because we were using older Hibernate 3, we had code to create our session factory as follows:
protected static SessionFactory buildSessionFactory(Database db)
{
if (db == null) {
throw new NullPointerException("Database specifier cannot be null");
}
try {
Configuration config = createSessionFactoryConfiguration(db);
String url = config.getProperty("connection.url");
String user = config.getProperty("connection.username");
String password = config.getProperty("connection.password");
try {
String dbDriver = config.getProperty("hibernate.connection.driver_class");
Class.forName(dbDriver);
Connection conn = DriverManager.getConnection(url, user, password);
}
catch (SQLException error) {
logger.info("Didn't find driver, on QA or production, so it's okay to assume we have DB connection");
error.printStackTrace();
}
SessionFactory sessionFactory = config.buildSessionFactory();
sessionFactoryConfigs.put(sessionFactory, config); // Cannot recover config from factory instance, must be stored.
return sessionFactory;
}
catch (Throwable ex) {
// Make sure you log the exception, as it might be swallowed
logger.error("Initial SessionFactory creation failed.", ex);
throw new ExceptionInInitializerError(ex);
}
}
The link that I referred to above has a much different way of creating the sessionfactory. So, I'll be testing that out to see if it works in our app.
Without Spring handling our sessions and transactions, in this app it is coded by hand the way it was done before Spring, and I haven't seen that kind of code in years.
I solved this issue with the help from the link I provided above. However, I didn't copy exactly what they did, but some of it helped. My solution is as follows:
protected static SessionFactory createSecuredDatabaseConfig() {
Configuration config = createUnrestrictedDatabaseConfig();
BootstrapServiceRegistry bootstrapRegistry =
new BootstrapServiceRegistryBuilder()
.applyIntegrator(new EkoEventListenerIntegrator())
.build();
ServiceRegistry serviceRegistry = new StandardServiceRegistryBuilder(bootstrapRegistry).applySettings(config.getProperties()).build();
SessionFactory sessionFactory = config.buildSessionFactory(serviceRegistry);
return sessionFactory;
}
This was it. I tried multiple different ways to register the events without the BootstrapServiceRegistry, but none of those worked. I did have to create the integrator. What I did NOT include was the following:
MetadataSources sources = new MetadataSources(serviceRegistry )
.addPackage("com.myproject.server.model");
Metadata metadata = sources.getMetadataBuilder().build();
// did not create the sessionFactory this way
sessionFactory = metadata.getSessionFactoryBuilder().build();
If I had gone further and use this method to create the sessionFactory, then all of my queries would have been complaining about not being able to find the parameterName, which is something else.
The Hibernate Integrator and this method to create the sessionFactory is all for the unit tests. Without registering these events, one unit test would fail, and now it doesn't. So, this solves my problem for now.

Reading HDFS extended attributes in HiveQL

I am working on a use case where we would like to add metadata (e.g. load time, data source...) to raw files as HDFS extended attributes (xattrs).
I was wondering if there was a way for HiveQL to retrieve such metadata in queries in the result set.
This would avoid storing such metadata in each record within raw files.
Would a custom Hive SerDe be a way to make such xattrs available? Otherwise, do you see another way to make this possible?
I am still relatively novice with this, so bear with me if I misused terms.
Thanks
There may be other ways to implement it, but after I discovered Hive virtual column 'INPUT__FILE__NAME' containing the URL of the source HDFS file, I create a User-Defined Function in Java to read its extended attributes. This function can be used in a Hive query as:
XAttrSimpleUDF(INPUT__FILE__NAME,'user.my_key')
The (quick and dirty) Java source code of the UDF looks like:
public class XAttrSimpleUDF extends UDF {
public Text evaluate(Text uri, Text attr) {
if(uri == null || attr == null) return null;
Text xAttrTxt = null;
try {
Configuration myConf = new Configuration();
//Creating filesystem using uri
URI myURI = URI.create(uri.toString());
FileSystem fs = FileSystem.get(myURI, myConf);
// Retrieve value of extended attribute
xAttrTxt = new Text(fs.getXAttr(new Path(myURI), attr.toString()));
} catch (IOException e) {
e.printStackTrace();
} catch (Exception e) {
e.printStackTrace();
}
return xAttrTxt;
}
}
I didn't test the performance of this when querying very large data sets.
I wished that extended attributes could be retrieved directly as a virtual column in a way similar to using virtual column INPUT__FILE__NAME.

How to list JBoss AS 7 datasource properties in Java code?

I'm running JBoss AS 7.1.0.CR1b. I've got several datasources defined in my standalone.xml e.g.
<subsystem xmlns="urn:jboss:domain:datasources:1.0">
<datasources>
<datasource jndi-name="java:/MyDS" pool-name="MyDS_Pool" enabled="true" use-java-context="true" use-ccm="true">
<connection-url>some-url</connection-url>
<driver>the-driver</driver>
[etc]
Everything works fine.
I'm trying to access the information contained here within my code - specifically the connection-url and driver properties.
I've tried getting the Datasource from JNDI, as normal, but it doesn't appear to provide access to these properties:
// catches removed
InitialContext context;
DataSource dataSource = null;
context = new InitialContext();
dataSource = (DataSource) context.lookup(jndi);
ClientInfo and DatabaseMetadata from a Connection object from this Datasource also don't contain these granular, JBoss properties either.
My code will be running inside the container with the datasource specfied, so all should be available. I've looked at the IronJacamar interface org.jboss.jca.common.api.metadata.ds.DataSource, and its implementing class, and these seem to have accessible hooks to the information I require, but I can't find any information on how to create such objects with these already deployed resources within the container (only constructor on impl involves inputting all properties manually).
JBoss AS 7's Command-Line Interface allows you to navigate and list the datasources as a directory system. http://www.paykin.info/java/add-datasource-programaticaly-cli-jboss-7/ provides an excellent post on how to use what I believe is the Java Management API to interact with the subsystem, but this appears to involve connecting to the target JBoss server. My code is already running within that server, so surely there must be an easier way to do this?
Hope somebody can help. Many thanks.
What you're really trying to do is a management action. The best way to is to use the management API's that are available.
Here is a simple standalone example:
public class Main {
public static void main(final String[] args) throws Exception {
final List<ModelNode> dataSources = getDataSources();
for (ModelNode dataSource : dataSources) {
System.out.printf("Datasource: %s%n", dataSource.asString());
}
}
public static List<ModelNode> getDataSources() throws IOException {
final ModelNode request = new ModelNode();
request.get(ClientConstants.OP).set("read-resource");
request.get("recursive").set(true);
request.get(ClientConstants.OP_ADDR).add("subsystem", "datasources");
ModelControllerClient client = null;
try {
client = ModelControllerClient.Factory.create(InetAddress.getByName("127.0.0.1"), 9999);
final ModelNode response = client.execute(new OperationBuilder(request).build());
reportFailure(response);
return response.get(ClientConstants.RESULT).get("data-source").asList();
} finally {
safeClose(client);
}
}
public static void safeClose(final Closeable closeable) {
if (closeable != null) try {
closeable.close();
} catch (Exception e) {
// no-op
}
}
private static void reportFailure(final ModelNode node) {
if (!node.get(ClientConstants.OUTCOME).asString().equals(ClientConstants.SUCCESS)) {
final String msg;
if (node.hasDefined(ClientConstants.FAILURE_DESCRIPTION)) {
if (node.hasDefined(ClientConstants.OP)) {
msg = String.format("Operation '%s' at address '%s' failed: %s", node.get(ClientConstants.OP), node.get(ClientConstants.OP_ADDR), node.get(ClientConstants.FAILURE_DESCRIPTION));
} else {
msg = String.format("Operation failed: %s", node.get(ClientConstants.FAILURE_DESCRIPTION));
}
} else {
msg = String.format("Operation failed: %s", node);
}
throw new RuntimeException(msg);
}
}
}
The only other way I can think of is to add module that relies on servers internals. It could be done, but I would probably use the management API first.