How do I create and synchronize a combined reporting-only db from two live dbs? - sql

I need to quickly implement a read-only database containing data pulled from two identically structured live databases.
The live dbs are actually company dbs from a Dynamics accounting system so I'm happy for any Dynamics specific advice but this is mostly a SQL question. It's a fairly old version of Dynamics from before Great Plains was acquired by Microsoft. This is on SQL Server 2000.
We have reports and applications which access the Dynamics data. These apps are designed to look at one company db. Now we need to add another. It's appropriate that most of these reports and apps see combined data. They don't really care which company an order or invoice exists in. They only look at a small number of the tables.
It seems to me that the simplest solution is to create a reports only db with combined data. Preferably, we need an efficient way to update this db with changes several times a day.
I'm a developer, not a db expert but here's my plan:
Create the combined reporting db with the required tables initially with the same table structure as the live dbs.
All Dynamics tables seem to have an int identity column called DEX_ROW_ID. I'm not sure what it's used for, (it's not indexed) but that seems like the obvious generic way to uniquely identify rows. On the reporting db I will change it to a normal int (not an identity). I will create a unique index on DEX_ROW_ID in all dbs.
Dynamics does not have timestamps so I will add a timestamp column to tables in the live dbs and a corresponding binary(8) column in the reporting db. I'm assuming and hoping that Dynamics won't be upset by the additional index and column.
Add an int CompanyId column to the reporting db tables and add it to the end of any unique indexes. Most data will be naturally unique even without that. ie, order and invoice numbers etc will be different for the two live dbs. We may need to make some minor changes to the applications but I'm not expecting to do much other than point them to the new reporting db.
Assuming my reporting db is called Reports, the live dbs are Live1 and Live2, the timestamp column is called TS and all dbs are on the same server ... here's my first attempt at an update script for copying the changes in one table called MyTable in Live1 to the reporting db.
USE Reports
CREATE TABLE #Changes
(
ReportId int,
LiveId int
)
/* Collect in a temp table the ids or rows which have been deleted or changed
in the live db L.DEX_ROW_ID will be null if the row has been deleted */
INSERT INTO #Changes
SELECT R.DEX_ROW_ID, L.DEX_ROW_ID
FROM MyTable R LEFT OUTER JOIN Live1.dbo.MyTable L ON L.DEX_ROW_ID = R.DEX_ROW_ID
WHERE R.CompanyId = 1 AND L.DEX_ROW_ID IS NULL OR L.TS <> R.TS
/* Delete rows that have been deleted or changed on the live db
I wonder if using join syntax would run better than the subquery. */
DELETE FROM MyTable
WHERE CompanyId = 1 AND DEX_ROW_ID IN (SELECT ReportId FROM #Changes)
/* Recopy rows that have changed in the live db */
INSERT INTO MyTable
SELECT 1 AS CompanyId, * FROM Live1.dbo.MyTable L
WHERE L.DEX_ROW_ID IN (SELECT ReportId FROM #Changes WHERE LiveId IS NOT NULL)
/* Copy the rows that are new in the live db */
INSERT INTO MyTable
SELECT 1 AS CompanyId, * FROM Live1.dbo.MyTable
WHERE DEX_ROW_ID > (SELECT MAX(DEX_ROW_ID) FROM MyTable WHERE CompanyId = 1)
Then do the same for the Live2 db. Repeat for every table in Reports. I know I should use a parameter #CompanyId instead of the literal but I can't do that for the live db name some I might generate these dynamically with a C# program or something.
I'm looking for any advice, suggestions or critique on what I'm doing here. I know it won't be atomic. Things could be happening on the live db while this script runs. I think we can live with that. We'll probably do a full copy either nightly or weekly when nothing is happening on the live dbs.
We need to favor performance over elegance or perfection. Some initial testing has the first query with the TS comparisons running at about 30 seconds for the biggest table so I'm optimistic that this is going to work but I'd also like to know if I'm missing something obvious or not seeing the forest for the trees.
We don't really want to deal with log files on the reporting db. Can we just set that to simple recovery model and forget about logs?
Thanks

I think there are a couple open questions here.
Do you need these reports to be near-real-time? Or is this this sort of reporting that could live with daily updates? But assume you need up-to-the-minute data.
Have you considered querying the databases directly and merging the data per-report on the fly? You'll have to do a lot of reporting to duplicate the effort that's going to go into designing, creating, and supporting a real-time merged replicated database.
Thirty seconds is (IMHO) unacceptable for any single query against a production database. There could be any number of tuning-related reasons for taking this long, but it at least means you're going to need serious professional SQL Server optimization resources (i.e. people). And if this is a problem for the queries for reports, it doesn't bode well for the queries to maintain a separate database for reporting.
Tuck into the back of your mind the consideration that, if you need to consolidate to a single database, it's worth considering whether you should make it an OLAP database rather than a mirror. The mirror will be quicker and easier, but the OLAP would be far more flexible and powerful in the long term; and it might be well to go the whole way from the beginning.

The last thing I'd want to do is write a custom update script. Try these bulletproof methods first:
Let's hope your production databases are backed up. Restore those backups every night to the reporting server. You can automate restores with the RESTORE command, which will work with a file on a network server.
Use SQL Server replication to push data from the live servers to the backend.
Schedule a DTS package every night to import the entire production database.
This might seem like brute force. But since you're copying a 2000-era database, brute force cannot be a problem with today's hardware. As an added advantage, these methods can be supported by a sysadmin instead of a developer.
Method 1 has the added added advantage of serving as backup verification. :)

Related

How do I copy data from one Azure database table to a different Azure database table and also convert data types?

I have to copy data from one table to another, the tables are held in two different databases within Azure. I did a quick search for answers to this and whilst a query seems fairly straight forward i.e.
INSERT INTO table1 (make, model, type, serial)
SELECT the_make, the_model, the_type, ref_no
FROM database2.dbo.table2
I encountered issues because I'm using Azure.
Msg 40515, Level 15, State 1, Line 16 Reference to database and/or
server name in 'database2.dbo.table2' is not supported in this version of
SQL Server.
The above issue led me to the Cross-Database Queries articles. My requirements are a little more complicated than some of the scenarios provided and I need some help in making it work.
I also need to convert some columns such as reg_no which is a 'string' to an 'int' and then copy the value to the 'serial' column.
My question is, what the best way to create a script for this that allows me to reference both databases without any errors, copy the data and convert the columns at the same time? I tried the simple way of exporting data and importing it, editing the mappings for the columns, it wasn't that good I found and was causing problems all over the place.
Any guidance is appreciated on this.
You're getting this error because there's no linked server by default. You'll need to add it, in order to access the secondary db server. Here's a link about how to do it:
https://www.sqlshack.com/create-linked-server-azure-sql-database/
In terms of the transformation. It depends on many factors e.g. amount of rows, frequency, etc..
Usually the best alternative is by using an external tool (ETL) such as SSIS / Azure Data Factory because you can schedule it's execution and get the status of each execution.

sql temp table join between servers

So I have a summary i need to return to the end user application.
It should accept 3 parameters DateType, StartDate, EndDate.
Date Type will determine the date field I use to filter the data.
The way i accomplished this was putting all the IDs of the records for a datetype into a TEMP table and then joining my summary to the list of IDs.
This worked fine when running on the query on the SQL server that houses the data.
However, that is a replicated server, so when I compiled to a stored proc that would be on the server with the rest of the application data, it slowed the query down. IE 2 seconds vs 50 seconds.
I think the cross join from the temp table that is created on the SQL server then joining to the tables on the replciation server, is causing the slow down.
Are there any methods or techniques that I can use to get around this and build this all in one stored procedure?
If I create 3 stored procedures with their own date range, then they are fast again. However, this means maintaining multiple stored procs for the same thing.
First off, if you are running a version of SQL Server older than 2012 SP1, one problem is that users who aren't allowed to run DBCC SHOW_STATISTICS (which is most users who aren't sysadmins, see the "Permissions" section in the documentation) don't get access to statistics on remote tables. This can severely cripple the optimizer's ability to generate a good execution plan. Upgrading SQL Server or granting more permissions can help there.
If your query involves filtering or joining on a character column, make sure the remote server is flagged in the linked server options as "collation compatible". If this option is off, SQL Server can't assume strings can be compared across the servers and it will start pumping entire tables up and down just to make sure the data ends up where the comparison has to be made.
If the execution plan is as good as it gets and it's still not good enough, one general (lame) technique is to transfer all data locally first (SELECT * INTO #localtable FROM remote.db.schema.table), then run the query as a non-distributed query. Obviously, in order for this to work, the remote table cannot be "too big" and in some cases this actually has worse performance, depending on how many rows are involved. But it's always worth considering, because the optimizer does a better job with local tables.
Another approach that avoids pulling tables together across servers is packing up data in parameters to remote stored procedure calls. Entire tables can be passed as XML through an NVARCHAR(MAX), since neither XML columns nor table-valued parameters are supported in distributed queries. The basic idea is the same: avoid the need for the the optimizer to figure out an efficient distributed query. The best approach greatly depends on your data and your query, obviously.

How to query a table to a view and publish to a different database

I have 13 SQL databases some 2005 others 2008, on a VPN. I'd like to take all of the data from the "Employees" table on each database and make it a view at each location. I would then like to publish these views to 1 database on another server, all in one table marking where each came from within the origninal databases. For example the database where all the information goes to would look like this:
User Name Location
bik Bob K 1
JS John S 2
Etc.
Any help is appreciated.
I assume you want the data on the final server to be viewable, but not modifiable, and to reflect changes made to the source databases?
This would probably not perform all that well, but one do-it-yourself-way to do it would be the following (disclaimer: I haven't tried doing this myself):
Set up all the source servers as linked servers on the final server.
Create a view in this form:
SELECT *, 1 as Location
FROM [Linked Server 1].Database1.dbo.Table1
UNION ALL
SELECT *, 2 as Location
FROM [Linked Server 2].Database2.dbo.Table2
... etc ....
You might want to read this documentation on distributed queries, if you haven't already.
I believe it's also possible to use SSIS as the source of a distributed query, but a quick scan through the documentation didn't find anything about it. I mention that because SSIS would make pulling and transforming data from disaparate data sources very easy, and if you could use the final recordset as a data source, you could use an SSIS package as the backend to your view. However, again, performance would probably require considerable tuning.
If the queries don't have to be real time you could look into using SQL Server Integration Services (SSIS) to pull in the data to a local DB. you could schedule the job to run hourly/daily/weekly..

SQL Server 2008, Sybase - large select queries over low bandwidth

I need to pull a large amount of data from various tables across a line that has very low bandwidth. I need to minimize the amount of data that gets sent too and fro.
On that side is a Sybase database, on this side SQL Server 2008.
What I need is to pull all the tables from the Sybase database that have to do with this office. Lets say I have the following tables as an example:
Farm
Tree
Branch
etc.
(one farm has many trees, one tree has many branches etc.)
Lets say the "Farm" table has a field called "CountryID", and I only want the data for where CountryID=12. The actual table structures I am looking at are very complex (and I am also not very familiar with them) so I want to try to keep the queries simple.
So I am thinking of setting up a series of views:
CREATE VIEW vw_Farm AS
SELECT * from Farm where CountryID=12
CREATE VIEW vw_Tree AS
SELECT * from Tree where FarmID in (SELECT FarmID FROM vw_Farm)
CREATE VIEW vw_Branch AS
SELECT * from Tree where BranchID in (SELECT BranchID FROM vw_Branch)
etc.
To then pull the actual data across I would then do:
SELECT * from vw_Farm into localDb.Farm
SELECT * from vw_Tree into localDb.Tree
SELECT * from vw_Branch into localDb.Branch
etc.
Simple enough to set up. I am wondering how this will perform though? Will it perform all the SELECT statements on the Sybase side and then just send back the result? Also, since this will be an iterative process, is it possible to index the views for subsequent calls?
Any other optimisation suggestions would also be welcome!
Thanks
Karl
EDIT: Just to clarify, the views will be set up in SQL Server. I am using a linked server using Sybase ASE to set up those views. What is worrying me in particular is whether the fact that the view is in SQL Server on this side and not on Sybase on that side will mean that for each iteration the data from the preceeing view will get pulled across to SQL Server first before the calculations get executed. I want Sybase to do all the calcs and just pass the results across.
It's difficult to be certain without testing, but my somewhat-relevant experience (using linked servers to platforms other than Sybase, and on SQL Server 2005) has been that using subqueries (such as your code for vw_Tree and vw_Branch) more or less guarantees that SQL Server will pull all the data for the outer table into a local temp table, then match it to the results of the inner query.
The problem is that SQL Server has no access to the linked server's table statistics, so can make no meaningful decisions about how to optimise the query.
If you want to be sure to have the work done on the Sybase server, your best bet will be to write code (could be views or stored procedures) on the Sybase side and reference them from SQL Server.
Linked server connections are, in my experience, not particularly resilient over flaky networks. If it's available, you could consider using Integration Services rather than linked-server queries - but even that may not be much better. You may need to consider falling back on moving text files with robocopy and bcp.

how to compare/validate sql schema

I'm looking for a way to validate the SQL schema on a production DB after updating an application version. If the application does not match the DB schema version, there should be a way to warn the user and list the changes needed.
Is there a tool or a framework (to use programatically) with built-in features to do that?
Or is there some simple algorithm to run this comparison?
Update: Red gate lists "from $395". Anything free? Or more foolproof than just keeping the version number?
Try this SQL.
- Run it against each database.
- Save the output to text files.
- Diff the text files.
/* get list of objects in the database */
SELECT name,
type
FROM sysobjects
ORDER BY type, name
/* get list of columns in each table / parameters for each stored procedure */
SELECT so.name,
so.type,
sc.name,
sc.number,
sc.colid,
sc.status,
sc.type,
sc.length,
sc.usertype ,
sc.scale
FROM sysobjects so ,
syscolumns sc
WHERE so.id = sc.id
ORDER BY so.type, so.name, sc.name
/* get definition of each stored procedure */
SELECT so.name,
so.type,
sc.number,
sc.text
FROM sysobjects so ,
syscomments sc
WHERE so.id = sc.id
ORDER BY so.type, so.name, sc.number
I hope I can help - this is the article I suggest reading:
Compare SQL Server database schemas automatically
It describes how you can automate the SQL Server schema comparison and synchronization process using T-SQL, SSMS or a third party tool.
You can do it programatically by looking in the data dictionary (sys.objects, sys.columns etc.) of both databases and comparing them. However, there are also tools like Redgate SQL Compare Pro that do this for you. I have specified this as a part of the tooling for QA on data warehouse systems on a few occasions now, including the one I am currently working on. On my current gig this was no problem at all, as the DBA's here were already using it.
The basic methodology for using these tools is to maintain a reference script that builds the database and keep this in version control. Run the script into a scratch database and compare it with your target to see the differences. It will also generate patch scripts if you feel so inclined.
As far as I know there's nothing free that does this unless you feel like writing your own. Redgate is cheap enough that it might as well be free. Even as a QA tool to prove that the production DB is not in the configuration it was meant to be it will save you its purchase price after one incident.
You can now use my SQL Admin Studio for free to run a Schema Compare, Data Compare and Sync the Changes. No longer requires a license key download from here http://www.simego.com/Products/SQL-Admin-Studio
Also works against SQL Azure.
[UPDATE: Yes I am the Author of the above program, as it's now Free I just wanted to Share it with the community]
If you are looking for a tool that can compare two databases and show you the difference Red Gate makes SQL Compare
You didn't mention which RDMBS you're using: if the INFORMATION SCHEMA views are available in your RDBMS, and if you can reference both schemas from the same host, you can query the INFORMATION SCHEMA views to identify differences in:
-tables
-columns
-column types
-constraints (e.g. primary keys, unique constraints, foreign keys, etc)
I've written a set of queries for exactly this purpose on SQL Server for a past job - it worked well to identify differences. Many of the queries were using LEFT JOINs with IS NULL to check for the absence of expected items, others were comparing things like column types or constraint names.
It's a little tedious, but its possible.
I found this small and free tool that fits most of my needs.
http://www.wintestgear.com/products/MSSQLSchemaDiff/MSSQLSchemaDiff.html
It's very basic but it shows you the schema differences of two databases.
It doesn't have any fancy stuff like auto generated scripts to make the differences to go away and it doesn't compare any data.
It's just a small, free utility that shows you schema differences :)
Make a table and store your version number in there. Just make sure you update it as necessary.
CREATE TABLE version (
version VARCHAR(255) NOT NULL
)
INSERT INTO version VALUES ('v1.0');
You can then check the version number stored in the database matches the application code during your app's setup or wherever is convenient.
SQL Compare by Red Gate.
Which RDBMS is this, and how complex are the potential changes?
Maybe this is just a matter of comparing row counts and index counts for each table -- if you have trigger and stored procedure versions to worry about also then you need something more industrial
Try dbForge Data Compare for SQL Server. It can compare and sync any databases, even very large ones. Quick, easy, always delivers a correct result.
Try it on your database and comment upon the product.
We can recommend you a reliable SQL comparison tool that offer 3 time’s faster comparison and synchronization of table data in your SQL Server databases. It's dbForge Data Compare for SQL Server.
Main advantages:
Speedier comparison and synchronization of large databases
Support of native SQL Server backups
Custom mapping of tables, columns, and schemas
Multiple options to tune your comparison and synchronization
Generating comparison and synchronization reports
Plus free 30-day trial and risk-free purchase with 30-day money back guarantee.