Anyone using Change Data Capture for Replication? - sql

I'm soon going to be involved in a project to replace our current method of replication from transactional replication to some other method (for various reasons). I'm considering using CDC as an alternate method. I'm envisioning CDC capturing all the changes and then another process would read the changes and apply them to a target database.
I don't know much about CDC and whether it's suitable for this task. Can anyone let me know of your experience doing this, or something closely related? Pros and cons, pitfalls etc.
Thanks very much.

We've been looking into the same thing. Our issue w/ replication has been the tight schema coupling. CDC allows for mostly the same functionality, without the schema coupling.
I think it's doable, but you may be reinventing the wheel. The typical use case for CDC is OLTP -> Data Warehouse, where schemas are completely different, and the data only moves in one direction.

Related

Does Duck DB support triggers?

I suspect the answer is no, but I just wanted to check if anyone has a way to implement triggers in DuckDB?
I have a SQLite database that relies heavily on views with INSTEAD OF INSERT/ UPDATE/ DELETE triggers to mask the underlying table structure from applications. Having heard good things about DuckDB I was hoping to try a port, but the lack of triggers (or an alternate mechanism for achieving the same goal) is a show stopper. I've no great desire to reimplement all of the functionality in the various host languages that access the database.
No.
The documentation of DuckDB (as of Oct'2022) doesn't contain any reference to database triggers.
That said, given the nature of the project -- an analytical (OLAP) database, instead of a transactional (OLTP) one -- I would say the developers have few incentives to implement this feature, anyway.
Also, this issue on GitHub DuckDB.
Both SQLite and DuckDB are two fantastic tools! But very different beasts. Use the best tool for your job.

SQL : Synchronize Read/Write Databases in CQRS , Asp.Net Core

I was reading about DDD and CQRS (using Asp.Net Core ,MSSQL), and their different approaches, then I read a topic about separating Read and Write Database ,so I started to search web about how to do so and how to sync those databases, but sadly(maybe I was searching wrong) I didn't find any good source to find how to do so.
So here is my question :
How should I separate those databases, and then how should I sync the data between them, e.g. I have a table called "User" which is in read and write separated dbs,now if I add a new row to the write table in write db, I have to tell the read db to sync itself with write db so I can have the new data there to query and use later,but how? I also read something about Event Sourcing Pattern or Event-Driven Architecture,but they didn't help me find out how to sync.
so anyone know how to do so or have any good resources about this topic which can help a dummy :)
(consider you're explaining for a guy who is learning it for the first time!).
Thanks!
I have a related answer that may provide some background on how to approach CQRS.
The main point to keep in mind is that the "write" side is concerned with changes/transaction (OLTP) and the "read" side is concerned with queries (OLAP).
How you update your "read" side (read model) is going to depend on how you make the "write" side changes. When using an Event Store things may be easier in that each event has a global sequence number and each projection (read model) tracks where it is in terms of the global sequence number. So when new events arrive (projection polls) then they can be actioned if the event applies to the projection.
If you simply update the "write" side with, say, a SQL query then things are going to be a bit different, and possibly tricky, since you don't have any mechanism to replay those changes into the read model should you wish to make changes. In such a case you could use messaging, and possibly store those, or make the changes to the "read" side together with the "write" side... which isn't ideal; unless you need 100% consistency.
As mentioned by #Levi Ramsey, the read model is usually quite a bit different from the write model in that it is optimised for reading so it may include denormalized data or simply be in a data store that is more suited to read models.
The main benefit of CQRS is around being able to use different data models and/or different databases for queries vs. updates. If they are using the same data model, there's often not much benefit (at least not with a DB like SQL Server which is, at most scales, reasonable for both) to CQRS.
This in turn implies that it's generally not possible to just have the two databases automatically be in sync, because there's going to be some model translation involved (e.g. from a relational DB (with a normalized schema) like SQL Server to a denormalized document DB like Mongo).
One fairly common pattern is to have the software which writes to the DB also publish events describing what was updated to some event bus. Another piece of software subscribes to those events and performs the appropriate updates to the read DB. Note that this implies the existence of a period of time where queries against the read DB and the write DB will give different results.

Can Infinispan act as a replacement for a conventional RDBMS

Apologies if the title made no absolute sense. But, on the other hand, I would like to know if there is any programming model which would let us use Infinispan cache as a real datastore and not just a grid on top of an underlying rdbms.
I know Key-Value stores have real limitations but I couldn't stop thinking about the possibilities of an in-memory solution with all or a subset of RDBMS functionalities. For example: If I want to retrieve a particular set of Keys based on value>34.56%, just like how we would use a where clause in an sql stmt.
My doubt is not specific to infinispan but any IMKVS for that purpose. I know it's a shot in the dark considering the data structures and algorithms behind IMKVS specifications.
Any help or direction to resources which talk about these lines would be of great help.
I suggest you write down all the queries that you execute against SQL DB and check if these could be translated into KVS language.
In Infinispan you can index the values and execute queries for such filtering, but you can't do any table joins.
If you are in need for more powerful API, specifically using JPA, take a look at Hibernate OGM.
And while KVSs offer some level of reliability, in practice I wouldn't trust the documentation too much. You need to perform extensive testing of your system and check that you can retrieve the data even after various types of crashes and network failures (or that you can live with throwing the data away).

How to go from a full SQL querying to something like a NoSQL?

In one of my process I have this SQL query that take 10-20% of the total execution time. This SQL query does a filter on my Database, and load a list of PricingGrid object.
So I want to improve these performance.
So far I guessed 2 solutions :
Use a NoSQL solution, AFAIK these are good solutions for improving reading process.
But the migration seems hard and needs a lot of work (like import the data from sql server to nosql in a regular basis)
I don't have any knowledge , I even don't know which one I should use (the first I'd use is Ravendb because I follow ayende and it's done by the .net community).
I might have some stuff to change in my model to make my object ok for a nosql database
Load all my PricingGrid object in memory (in a static IEnumerable)
This might be a problem when my server won't have enough memory to load everything
I might reinvent the wheel (indexes...) invented by the NoSQL providers
I think I'm not the first one wondering this, so what would be the best solution ? Is there any tools that could help me ?
.net 3.5, SQL Server 2005, windows server 2005
Migrating your data from SQL is only the first step.
Moving to a document store (like RavenDB or MongoDB) also means that you need to:
Denormalize your data
Perform schema validation in your code
Handle concurrency of complex operations in your code since you no longer have transactions (at least not the same way)
Perform rollbacks in the event of partial commits (changes)
Depending on your updates, reads and network model you might also need to handle conflicts
You provided very limited information but it sounds like your needs include a single database server and that your data fits well in the relational model.
In such a case I would vote against a NoSQL solution, it is more likely that you can speed up your queries with database optimizations and still retain all the added value of a RDBMS.
Non-relational databases are tools for a specific job (no matter how they sell them), if you need them it is usually because your data doesn't fit well in the relational model or if you have a need to distribute your data over multiple machines (size or availability). For instance, I use MongoDB for a write-intensive high throughput job management application. It is centralized and the data is very transient so the "cost" of having low durability is acceptable. This doesn't sound like the case for you.
If prefer to use a NoSQL solution perhaps you should try using Memcached+MySQL (InnoDB) this will allow you to get the speed benefits of an in-memory cache (in the form of a memcached daemon plugin) with the underlying protection and capabilities of an RDBMS (MySQL). It should also ease data migration and somewhat reduce the amount of changes required in your code.
I myself have never used it, I find that I either need NoSQL for the reasons I stated above or that I can optimize the RDBMS using stored procedures, indexes and table views in a way which is sufficient for my needs.
Asaf has provided great information in regards to the usage of NoSQL and when it is most appropriate. Given that your main concern was performance, I would tend to agree with his opinion - it would take you much more time and effort to adopt a completely new (and very different) data persistence platform than it would to trick out your SQL Server cluster. That said, my answer is mainly to address the "how" part of your question.
Addressing misunderstandings:
Denormalizing Data - You do not need to manually denormalize your existing data. This will be done for you when it is migrated over. More than anything you need to simply think about your data in a different fashion - root aggregates, entity and value types, etc.
Concurrency/Transactions - Transactions are possible in both Mongo and Raven, they are simply done in a different fashion. One of the inherent ways Raven does this is by using an ORM-like "unit of work" pattern with its RavenSession objects. Yes, your data validation needs to be done in code, but you already should be doing it there anyway. In my experience this is an over-hyped con.
How:
Install Raven or Mongo on a primary server, run it as a service.
Create or extend an existing application that uses the database you intend to port. This application needs all the model classes/libraries that your SQL database provides persistence for.
a. In your "data layer" you likely have a repository class somewhere. Extract an interface form this, and use it to build another repository class for your Raven/Mongo persistence. Both DB's have plenty good documentation for using their APIs to push/pull/update changes in the document graphs. It's pretty damn simple.
b. Load your SQL data into C# objects in memory. Pull back your top-level objects (just the entities) and load their inner collections and related data in memory. Your repository is probably already doing this (ex. when fetching an Order object, ensure not only its properties but associated collections like Items are loaded in memory.
c. Instantiate your Raven/Mongo repository and push the data to it. Primary entities become "top level documents" or "root aggregates" serialized in JSON, and their collections' data nested within. Save changes and close the repository. Note: You may break this step down into as many little pieces as your data deems necessary.
Once your data is migrated, play around with it and ensure you are satisfied. You may want to modify your application Models a little to adjust the way they are persisted to Raven/Mongo - for instance you may want to make both Orders and Items top-level documents and simply use reference values (much like relationships in RDBMS systems). Watch out here though, as doing so sort-of goes against the principal and performance behind NoSQL as now you have to tap the DB twice to get the Order and the Items.
If satisfied, shard/replicate your mongo/raven servers across your remaining available server boxes.
Obviously there are tons of little details I did not explain, but that is the general process, and much of it depends on the applications already consuming the database and may be tricky if more than one app/system talks to it.
Lastly, just to reiterate what Asaf said... learn as much as you can about NoSQL and its best use-cases. It is an amazing tool, but not golden solution for all data persistence. In your case try to really find the bottlenecks in your current solution and see if they are solvable. As one of my systems guys says, "technology for technology's sake is bullshit"

Handling linked systems

We have many systems that talk to each other and its become a bit of a mess. e.g system B gets data from system A and System A gets data from System C which also gets data from System B etc etc. The data is passed around using a variety of methods. Some of the data is copied across using sql periodically thus duplicating the data. Some of the data is pulled using views locally and remotely in real-time. We want to come up with a better solution. My plan is to create a central repository that the systems dump and get data from. Does this sound like a good idea? Whats the best practice for handling data between remote systems?
Thanks in advance.
You mean like a data warehouse? This is pretty standard as long as you don't want to update the data, and just want to use it for reporting/driving other applications.
You have a variety of options for getting the data in there including linked servers, SSIS packages and replication (if between oracle servers or ms sql servers)
You can read Microsoft recomendation: http://technet.microsoft.com/en-us/library/dd459147(SQL.100).aspx
As Martin Booth and Dalex say, if the data is used only for reporting, a datawarehouse is the obvious solution.
If you use the data in transactional systems, there are some other options.
If your system is primarily about data, I'd consider using ETL tools (http://en.wikipedia.org/wiki/Extract,_transform,_load) to manage the copying around of data.
If your system is not just about data, you should look at a service-oriented architecture; this is a brilliantly vague term, and can result in many billable consulting hours, so it's worth doing your homework. In general, the idea is to decouple the underlying implementation (views, replication, dump/restore etc.) from the conceptual "services". This might be too big a jump from where you are now - but the principles are useful when design your solution.