How to push schema to Apicurio schema registry without any cdc event/records using Debezium - schema

Usecase: I need my all schemas for all tables in sql database to be present in apicurio registry without having any cdc records in cdc tables using Debezium.
Currently, schema is pushed to Apicurio for only those table on which CDC is enabled and if any of their record is updated.
Requirement: Is there any way to push all schemas of all cdc enabled tables to Apicurio without cdc events or without modifying any record of any table?
TIA

Apicurio has a REST API you could use in order to Create/Read/Update/Delete schemas. Hope it helps. Note that schemas should be compatible with the schemas generated by Debezium if you want Debezium to use those, otherwise, it will fail.

Related

Kafka Connect JDBC vs Debezium CDC

What are the differences between JDBC Connector and Debezium SQL Server CDC Connector (or any other relational database connector) and when should I choose one over another, searching for a solution to sync between two relational databases?
Not sure if this discussion should be about CDC vs JDBC Connector, and not Debezium SQL Server CDC Connector, or even just Debezium, looking forward for later editing, depends on the given answers (Though my case is about SQL Server sink).
Sharing with you my research about this topic which led me to the question (as an answer)
This explanation focuses on the differences between Debezium SQL Server CDC Connector and JDBC Connector, with more general interpretation about Debezium and CDC.
tl;dr- scroll down :)
Debezium
Debezium is used only as a source connector, records all row-level changes.
Debezium Documentation says:
Debezium is a set of distributed services to capture changes in your
databases so that your applications can see those changes and respond
to them. Debezium records all row-level changes within each database
table in a change event stream, and applications simply read these
streams to see the change events in the same order in which they
occurred.
Debezium Connector for SQL Server first records a snapshot of the database and then sending records of row-level changes to Kafka, each table to different Kafka topic.
Debezium Connector for SQL Server Documentation says:
Debezium’s SQL Server Connector can monitor and record the row-level
changes in the schemas of a SQL Server database.
The first time it connects to a SQL Server database/cluster, it reads
a consistent snapshot of all of the schemas. When that snapshot is
complete, the connector continuously streams the changes that were
committed to SQL Server and generates corresponding insert, update and
delete events. All of the events for each table are recorded in a
separate Kafka topic, where they can be easily consumed by
applications and services.
Kafka Connect JDBC
Kafka Connect JDBC can be used either as a source or a sink connector to Kafka, supports any database with JDBC driver.
JDBC Connector Documentation says:
You can use the Kafka Connect JDBC source connector to import data
from any relational database with a JDBC driver into Apache Kafka®
topics. You can use the JDBC sink connector to export data from Kafka
topics to any relational database with a JDBC driver. The JDBC
connector supports a wide variety of databases without requiring
custom code for each one.
They have some specifications about installing on Microsoft SQL Server which I find non relevant for this discussion.
So if JDBC Connector supports both source and sink and Debezium supports only source (not sink), we understand that in order to write data from Kafka to databases with a JDBC driver (sink), the JDBC Connector is the way to go (including SQL Server).
Now the comparison should be narrowed only to the sources field.
JDBC Source Connector Documentation doesn't say much more at first sight:
Data is loaded by periodically executing a SQL query and creating an
output record for each row in the result set. By default, all tables
in a database are copied, each to its own output topic. The database
is monitored for new or deleted tables and adapts automatically. When
copying data from a table, the connector can load only new or modified
rows by specifying which columns should be used to detect new or
modified data.
Searching a little further in order to understand their differences, in this Debezium blog which uses Debezium MySQL Connector as a source and JDBC Connector as a sink, there is an explanation about the differences between the two, which generally telling us that Debezium provides records with more information about the database changes, while JDBC Connector provides records which are more focused about converting the database changes into simple insert/upsert commands:
The Debezium MySQL Connector was designed to specifically capture
database changes and provide as much information as possible about
those events beyond just the new state of each row. Meanwhile, the
Confluent JDBC Sink Connector was designed to simply convert each
message into a database insert/upsert based upon the structure of the
message. So, the two connectors have different structures for the
messages, but they also use different topic naming conventions and
behavior of representing deleted records.
Moreover, they have different topic naming and different delete methods:
Debezium uses fully qualified naming for target topics representing
each table it manages. The naming follows the pattern
[logical-name].[database-name].[table-name]. Kafka Connect JDBC
Connector works with simple names [table-name].
...
When the Debezium connector detects a row is deleted, it creates two
event messages: a delete event and a tombstone message. The delete
message has an envelope with the state of the deleted row in the
before field, and an after field that is null. The tombstone message
contains same key as the delete message, but the entire message value
is null, and Kafka’s log compaction utilizes this to know that it can
remove any earlier messages with the same key. A number of sink
connectors, including the Confluent’s JDBC Sink Connector, are not
expecting these messages and will instead fail if they see either kind
of message.
This Confluent blog explains more how CDC and JDBC Connector works, it (JDBC Connector) executing queries to the source database every fixed interval, which is not very scalable solution, while CDC has higher frequency, streaming from the database transaction log:
The connector works by executing a query, over JDBC, against the
source database. It does this to pull in all rows (bulk) or those that
changed since previously (incremental). This query is executed at the
interval defined in poll.interval.ms. Depending on the volumes of data
involved, the physical database design (indexing, etc.), and other
workload on the database, this may not prove to be the most scalable
option.
...
Done properly, CDC basically enables you to stream every single event
from a database into Kafka. Broadly put, relational databases use a
transaction log (also called a binlog or redo log depending on DB
flavour), to which every event in the database is written. Update a
row, insert a row, delete a row – it all goes to the database’s
transaction log. CDC tools generally work by utilising this
transaction log to extract at very low latency and low impact the
events that are occurring on the database (or a schema/table within
it).
This blog also states the differences between CDC and JDBC Connector, mainly says that JDBC Connector doesn't support syncing deleted records thus fits for prototyping, and CDC fits for more mature systems:
The JDBC Connector cannot fetch deleted rows. Because, how do you
query for data that doesn’t exist?
...
My general steer on CDC vs JDBC is that JDBC is great for prototyping,
and fine low-volume workloads. Things to consider if using the JDBC
connector:
Doesn’t give true CDC (capture delete records, want before/after
record versions) Latency in detecting new events Impact of polling the
source database continually (and balancing this with the desired
latency) Unless you’re doing a bulk pull from a table, you need to
have an ID and/or timestamp that you can use to spot new records. If
you don’t own the schema, this becomes a problem.
tl;dr Conclusion
The main differences between Debezium and JDBC Connector are:
Debezium is used only as a Kafka source and JDBC Connector can be used as Kafka source and sink.
For sources:
JDBC Connector doesn't support syncing deleted records, while Debezium does.
JDBC Connector queries the database every fixed interval, which is not very scalable solution, while CDC has higher frequency, streaming from the database transaction log.
Debezium provides records with more information about the database changes, and JDBC Connector provides records which are more focused about converting the database changes into simple insert/upsert commands.
Different topic naming.
Simply we can say, CDC is kind of log based streaming similarly kafka connect jdbc source connectors are query based streaming.. :)
In "JDBC Connector" you cannot capture DDL changes like new tables, columns etc. With Debezium connector you can track data structure changes so you also adjust the sink connector if it necesary.

What happens if I delete Meta Store of Hive from My SQL Server?

What happens if I delete Meta Store of Hive from My SQL Server? Please provide me the details for Managed Table and External Table.
You lose the metadata - that's all. The data will be unaffected. So you can then simply re-run the scripts that define your tables. This is true for both managed and external tables.
So the deletion (and re-creation) may be a good way for you to clean up a potentially corrupted metastore.

Initialize a Transactional Replication from a Backup of subscription

I had a transactional replication in SQL server 2012. people changed the data inside the subscription database. so changes has been added to subscription from:
1- my publication
2- people who inserted their own data directly into subscription database.
I wana rebuild my replication, is there any way to rebuild replication using my subscription database which have users data?
Thanks,
Babak
The short answer is in the documentation under the subheading "Databases at the Subscriber".
But the longer answer is that you should design your replication topology in such a way so as to make the subscription database completely expendable. That is, if you have data that the users are entering directly into that database, either put it in separate tables (and put those tables on separate a separate filegroup) or, better, put those tables in a separate database completely and create views/synonyms to those tables in your subscriber database.

Connecting external SAP database to MySQL

My application relies on data that is stored in a SAP Database. In addition to that, I need to persist data that references the SAP data but cannot be stored in the SAP database. So there is another (MySQL) database.
What would be the cleanest way to connect those 2 datasources?
I think it would be better/safer to constantly get the data I need from SAP and also store it in the MySQL database, since I can reference them with FKs. But I do not want to always check for changes in the SAP database and then copy all the data to MySQL since queries to SAP are usually very expensive.

Azure Data Sync and Triggers

I use SQL Azure Data Sync to sync my remote Azure database with my local SQL database. Data Sync does create some addional tables on client and server and also adds delete, insert and update triggers to existing tables.
For what are these triggers? Can i delete them? I don't think so?
Problem now is that i can't edit data on server.
I get the error
The target table 'dbo.Corporation' of the DML statement cannot have any
enabled triggers if the statement contains an OUTPUT clause without INTO clause.
The triggers are added by the Microsoft Sync Framework, which is being used for SQL Azure Data Sync. And, yes you can't delete them, because the SQL Azure Data Sync will stop working. It is not that easy to modify tables after they are provisioned. If you are adding columns check out this question. If it is something else, just try searching solution to your project tagged under Microsoft sync framework and not SQL Azure.