Disparate database replication (between different types of RDBMS) - replication

I'm looking for a recommendation for a product that enables replication between different database systems. We're just looking for standard replication (copy this table over here and apply transactions as they happen) - nothing fancy.
We can't use built-in database replication because our source server (Tandem/NSK) doesn't support any kind of push replication at all. We've been using GoldenGate (http://www.goldengate.com/), but I'm interested in other choices out there. If it matters, out destination server is an MSSQL box.
I know it gets expensive, so I'm not just interested in free products, and I'd prefer something with professional support. If you have some experience with a product like GoldenGate, I'd love to hear it, as I imagine there have to be other products on the market that do what GoldenGate does.

SymmetricDS might be a solution you would be interested in. It synchronizes different types of RDBMSs, but it does use database triggers and is Java based. It was built for more of a highly distributed and possible disconnected environment, but would work for point to point type replication.

Related

If I'm using PostgreSQL, do I need a server too? Like AWS RDS?

In my CS program, I was told I should learn SQL for my databases.
If I'm using PostgreSQL, do I also need a SQL server to go along with it? Is PostgreSQL a language, a server, or both? Is there even a SQL language or is it only servers?
Background: I downloaded Postgres because hey, that has SQL in the name, it works and I'm under the impression it's a pretty good choice anyway. But I couldn't figure out through their website if it needs a companion server, so I went looking for one and found AWS RDS.
The impression I have is that Postgres is the language and AWS RDS is the server, and they serve different functions. But I'm not sure about any of that.
Seems you're learning too many new topics at the same time.
Ok. I'll try to answer.
SQL stands for 'Structured Query Language', and serves as a 'standard' for many vendors that in much ways respects its fundamentals. Oracle, MySQL (now owned by Oracle), MariaDB and PostreSQL are some vendors.
Main thing with SQL code I would recommend you to identify every time you look at it, is to understand if it belongs to DML or DDL. DML stands for 'Data Manipulation Language' and refers to SQL instructions which 'modifies' data. DDL stands for 'Data Declaration Language' which defines or 'alter' de structure on which data will be stored.
Another important concept is atomicity of data manipulation. You can confirm a change or roll it back before it is persisted. This thing corresponds to 'commit' changes or do a 'rollback'. It's some kind of advanced concept, but generally happens "automatically" with standard client configurations. Later, you would have to know about it while programming some system module which interacts with databases.
When you think of the SQL 'server', it refers to the software configured/installed which has the responsability of manage persistence of data within some kind of 'instance' of persistence, allocated in some system with data storage capabilities. AWS implements this service in the cloud, and RDS is the product which supports many kind of SQL flavors to choose (Oracle, Postgresql, etc.)
If you are comfortable with Docker, I recomend you learn the basics which would help you setup and destroy databases many times, which is useful to develop and test locally. Next command, let you start a Postgresql database configured with open port 5432. You can see the server log through docker and use some SQL client to get connected. When you press Ctrl+C everything will be deleted. Of course there are other ways to keep data persistent, but this command would be an easy starting point.
$ docker run --rm -p 5432:5432 --name some-postgres-container-name -e POSTGRES_PASSWORD=mysecretpassword postgres:13.3
Side note: it's better to get used to work with specific docker image versions always (not 'latest').
More details of it usage here: https://hub.docker.com/_/postgres/
if I'm using PostgreSQL, do I also need a SQL server to go along with
it? Is PostgreSQL a language, a server, or both? Is there even a SQL
language or is it only servers lol? I'm genuinely trying to figure
this out myself, but basically everything I read is beyond my scope of
competence and confuses me more. I'm learning the syntax of SQL well
enough, but I'm so confused about everything on the most fundamental
level.
By the way "SQL Server" is Microsoft's SQL flavor, just another one. Don't be confused with the concept of having some SQL server configured.
Yes, you can think of PostgreSQL as a language too, which shares most of its syntax and semantics with other SQL vendors. Yes, there is a 'basic' SQL language shared and compatible between all vendors; some share more aspects than others. In terms of Venn diagrams, you can think of many circles representing each one, Microsoft's SQL Server, Oracle SQL, PostgreSQL, MySQL, etc. sharing the very most of its elements, where each element is a SQL instruction.
When dealing with Databases in general, keep in mind that they helps to modelate situations of 'real world' scenarios or software systems. SQL allows to 'talk' to implementation of "Relational Databases" wich is one kind of database modeling, but there are others too. ER Diagrams helps to represent the 'structure' of a database in a conceptual manner. I like DBeaver because it has an integrated ER diagram generator wich helps to understand the structure of a given database instance.
I have used Postgres and it is an excellent product (and free).
I would install it standalone first. It does come with its own client tools, which you use to communicate with the database server, which runs independently as a service. However, you might be better off installing something like SqlWorkbench as a client tool (which I use). In the config you specify the machine Postgres is running on (which can be your local computer for testing purposes) and the port to connect on. Essentially, the client sends your instructions to Postgres server and the server returns the resultsets associated with your instructions. The client also formats the resultsets into a nice readable "spreadsheet" format with rows and columns.
First I'll try to answer the questions you asked. There is a SQL language, but in practice it is not strictly standardized. There are many offerings for databases and database servers. Many of these are discussed below.
Any database you pick will give you the chance to learn basics of SQL queries and this knowledge will serve you well even if you switch to a different database later.
Specifically, when it comes to PostgreSQL, it is a Relational Database Management System. It is a software that operates as a server. You can install it on your personal computer running Windows, Linux, or MacOS. You can also install in on a dedicated server computer where you'll get better performance and uptime. Further, there are many companies that offer PostgreSQL hosting including Amazon RDS and Google Cloud but they're not free.
For a CS student, PostreSQL installed on your personal computer might be a reasonable choice. But you have lots of options. Read on....
For a CS program, your choice of database will depend on:
what degree of portability you need
how much data you have
how many users will connect to database
what kinds of jobs you might pursue after graduation
Portability
If you think you want to ship your database with your application, then your best bet is probably SQLite. By some accounts it can handle several million rows worth of data and still be performant. However, it's not great if you need for multiple users to connect to the same database. Your data can get corrupted in many multi-user scenarios.
How Much Data and Users
For large data and large users, you'll want to consider the client/server heavy hitters:
PostgreSQL
MySQL/MariaDB
Oracle
SQL Server
These databases will support large quantities of data any many simultaneous connections. But if you want to distribute the database with your application, it's not a good idea. Or if you want to demonstrate your app, you need to ensure that a connection to a server will be available. All of these databases come with a free version, but the last two will have the most restrictions.
After Graduation
Now you're looking to the future and possibly what kind of skills you want to put on your resume. If you think you'll end up in a corporate environment that is already well established, they will likely already have a preferred database and it could be any of the ones listed here (SQLite or the "heavy hitters"). If you want to position yourself as developing apps with low overhead cost, you'll gravitate towards SQLite/PostgreSQL/MySQL. If you think you're going to be some kind of database administrator working in a buttoned-up corporate environment, those companies tend to favor SQL Server and Oracle.
Good luck. Any choice you make will probably be fine. Knowing some flavor of SQL is useful for your future endeavors.
SQL is a language like any other language but working on database. It is called SQL because it works on structured data like table (i.e rows and columns). After reading the documentation of PostgreSQL, I think we do not need any separate server installation. You can download it from here. If you are facing any issues with it I suggest using MySQL workbench. Although installation may take longer time, but its easy to understand.

What is the best approach to archiving operational data?

I have a sql server 2012 database which is the backend to an asp.net MVC application, storing customer and order information. This database is accessed under high load and high usage.
I know have a requirement to be able to generate ad hoc reports from the database accessing the same data as the MVC application works with. I am concerned what impact this would have on the database server and the database itself, around locking etc. As such their is a distinction between the data, for the app its operational, but for the reports its more data warehouse oriented.
Therefore I am looking at my options as to the best approach to avoid such.
I am considering creating another database on a different server and archive the data to it using a sql job at regular intervals during the day. Only concern around this is that it would require maintenance and also a dependency to ensure any necessary changes are made to the target database when the source database changes.
What other options opened to me in such a situation and what advice could be given regarding such? What is the best approach to such?
You don't have to think of your own solution to keep the databases in sync. SQL Server has build in ways to achieve this.
Database Mirroring
Replication
Always On Availability Groups
If you're using Enterprise Edition of SQL Server 2012 then I would look into Always On Availability Groups if not then (Transactional) Replication. Both of these solutions can keep a second read-only and near real-time copy of the database.
As Steve McConell suggests you should make no assumptions about performance. You should just measure it before making any decisions. It is not a wise choice to make design choices without knowing the actual performance overhead. So I would suggest to measure, or simulate the performance overhead before even consider using a complex architecture, because you would not know if it's worth the trouble.
Anyway, I think that your approach is right. I would create a windows service which periodically retrieves the data I need from my database and stores them in my warehouse (the new database). I don't think you would ever find a tool keeping consistency between the two schemas, unless you want one schema to be an exact copy of the other.
I don't know your exact needs and perhaps my suggestion is an overkill but I would encourage you to consider using an OLAP approach in the data warehouse where your reporting data will come from. I have to warn you that these systems are oriented in really big data and advanced reporting needs but perhaps you can take some ideas from them. Since you are familiar with the Microsoft ecosystem, I would suggest using Business Intelligence Studio. You could there build an OLAP cube using your normal database as data source and integrate advanced reporting.
Hope I helped.

Handling linked systems

We have many systems that talk to each other and its become a bit of a mess. e.g system B gets data from system A and System A gets data from System C which also gets data from System B etc etc. The data is passed around using a variety of methods. Some of the data is copied across using sql periodically thus duplicating the data. Some of the data is pulled using views locally and remotely in real-time. We want to come up with a better solution. My plan is to create a central repository that the systems dump and get data from. Does this sound like a good idea? Whats the best practice for handling data between remote systems?
Thanks in advance.
You mean like a data warehouse? This is pretty standard as long as you don't want to update the data, and just want to use it for reporting/driving other applications.
You have a variety of options for getting the data in there including linked servers, SSIS packages and replication (if between oracle servers or ms sql servers)
You can read Microsoft recomendation: http://technet.microsoft.com/en-us/library/dd459147(SQL.100).aspx
As Martin Booth and Dalex say, if the data is used only for reporting, a datawarehouse is the obvious solution.
If you use the data in transactional systems, there are some other options.
If your system is primarily about data, I'd consider using ETL tools (http://en.wikipedia.org/wiki/Extract,_transform,_load) to manage the copying around of data.
If your system is not just about data, you should look at a service-oriented architecture; this is a brilliantly vague term, and can result in many billable consulting hours, so it's worth doing your homework. In general, the idea is to decouple the underlying implementation (views, replication, dump/restore etc.) from the conceptual "services". This might be too big a jump from where you are now - but the principles are useful when design your solution.

how to make a db schema such that its use is supported by all db management systems

is there a windows xp utility to make a database such that its support by sql server, oracle, and other db management systems.
the database schema is very huge so i would like to know what to use to make it so its protable from sql server to oracle if future demands that change?
In short, what you seek is nearly impossible to do successfully. Every database product has enough quirks that building such database would not perform well and would be too limiting in terms of the features you were able to use. I.e, you have to play the game of lowest common denominator with respect to features that all products implement you want to support. A far better solution is to abstract the data layer into its own library accessed via interfaces so that you can swap out your data layer. ORMs, as Rafael E. Belliard suggested, makes this simpler but it can also be done manually.
I would recommend building your database using an ORM like Hibernate for Java (or NHibernate for .NET). This would allow you to seamlessly transition from one database type to the other with little to no issues. They would allow you to logically create the database schema without a specific database in mind, which you could then move from one database to the other.
I have created applications which change from SQL Server to MySQL to Oracle to MS Access to SQLite easily (clients love that flexibility).
However, you would need to know your way around programming...

What point should someone decide to switch Database Systems

When developing whether its Web or Desktop at which point should a developer switch from SQLite, MySQL, MS SQL, etc
It depends on what you are doing. You might switch if:
You need more scalability or better performance - say from SQLite to SQL Server or Oracle.
You need access to more specific datatypes.
You need to support a customer that only runs a particular database.
You need better DBA tools.
Your application is using a different platform where your database no longer runs, or it's libraries do not run.
You have the ability/time/budget to actually make the change. Depending on the situation, the migration could be a bigger project than everything in the project up to that point. Migrations like these are great places to introduce inconsistencies, or to lose data, so a lot of care is required.
There are many more reasons for switching and it all depends on your requirements and the attributes of the databases.
You should switch databases at milestone 2.3433, 3ps prior to the left branch of dendrite 8,151,215.
You should switch databases when you have a reason to do so, would be my advice. If your existing database is performing to your expectations, supports the load that is being placed on it by your production systems, has the features you require in your applications and you aren't bored with it, why change? However, if you find your application isn't scaling, or you are designing an application that has high load or scalability requirements and your research tells you your current database platform is weak in that area, or, as was already mentioned, you need some spatial analysis or feature that a particular database has, well there you go.
Another consideration might be taking up the use of a database agnostic ORM tool that can allow you to experiment freely with different database platforms with a simple configuration setting. That was the trigger for us to consider trying out something new in the DB department. If our application can handle any DB the ORM can handle, why pay licensing fees on a commercial database when an open source DB works just as well for the levels of performance we require?
The bottom line, though, is that with databases or any other technology, I think there are no "business rules" that will tell you when it is time to switch - your scenario will tell you it is time to switch because something in your solution won't be quite right, and if you aren't at that point, no need to change.
BrianLy hit the nail on the head, but I'd also add that you may end up using different databases at different levels of development. It's not uncommon for developers to use SQLite on their workstation when they're coding against their personal development server, and then have the staging and/or production sites using a different database tool.
Of course, if you're using extensions or capabilities specific to a certain database tool (say, PostGIS in PostGreSQL), then obviously that wouldn't work.