How do I keep a table synchronized with a query in SQL Server - ETL? - sql

I wan't sure how to word this question so I'll try and explain. I have a third-party database on SQL Server 2005. I have another SQL Server 2008, which I want to "publish" some of the data in the third-party database too. This database I shall then use as the back-end for a portal and reporting services - it shall be the data warehouse.
On the destination server I want store the data in different table structures to that in the third-party db. Some tables I want to denormalize and there are lots of columns that aren't necessary. I'll also need to add additional fields to some of the tables which I'll need to update based on data stored in the same rows. For example, there are varchar fields that contain info I'll want to populate other columns with. All of this should cleanse the data and make it easier to report on.
I can write the query(s) to get all the info I want in a particular destination table. However, I want to be able to keep it up-to-date with the source on the other server. It doesn't have to be updated immediately (although that would be good) but I'd like for it be updated perhaps every 10 minutes. There are 100's of thousands of rows of data but the changes to the data and addition of new rows etc. isn't huge.
I've had a look around but I'm still not sure the best way to achieve this. As far as I can tell replication won't do what I need. I could manually write the t-sql to do the updates perhaps using the Merge statement and then schedule it as a job with sql server agent. I've also been having a look at SSIS and that looks to be geared at the ETL kind of thing.
I'm just not sure what to use to achieve this and I was hoping to get some advice on how one should go about doing this kind-of thing? Any suggestions would be greatly appreciated.

For that tables whose schemas/realtions are not changing, I would still strongly recommend Replication.
For the tables whose data and/or relations are changing significantly, then I would recommend that you develop a Service Broker implementation to handle that. The hi-level approach with service broker (SB) is:
Table-->Trigger-->SB.Service >====> SB.Queue-->StoredProc(activated)-->Table(s)
I would not recommend SSIS for this, unless you wanted to go to something like dialy exports/imports. It's fine for that kind of thing, but IMHO far too kludgey and cumbersome for either continuous or short-period incremental data distribution.

Nick, I have gone the SSIS route myself. I have jobs that run every 15 minutes that are based in SSIS and do the exact thing you are trying to do. We have a huge relational database and then we wanted to do complicated reporting on top of it using a product called Tableau. We quickly discovered that our relational model wasn't really so hot for that so I built a cube over it with SSAS and that cube is updated and processed every 15 minutes.
Yes SSIS does give the aura of being mainly for straight ETL jobs but I have found that it can be used for simple quick jobs like this as well.

I think, staging and partitioning will be too much for your case. I am implementing the same thing in SSIS now but with a frequency of 1 hour as I need to give some time for support activities. I am sure that using SSIS is a good way of doing it.
During the design, I had thought of another way to achieve custom replication, by customizing the Change Data Capture (CDC) process. This way you can get near real time replication, but is a tricky thing.

Related

Which is the best database for accessing big time-based data?

we need some help, choosing the best database for the following upcoming situation:
Write:
We'll get a huge number of data each minute (round about one million entries) and need to save them in a database. One of the unique identifiers of the entries are timestamps.
Read:
The software should be able to load the stored data for an user-defined time range. The loading process should be as fast as possible.
Currently, we are using the Microsoft SQL Server in our application. I'm not sure, if this is the right technology for our new requirements.
Which database should we use? Do we need to replace MSSQL with something or is another database able to run simultaneously with it?
Thank you!

How to model data flows with a SQL backend?

My question is not about a specific code. I am trying to automate a business data governance data flow using a SQL backend. I have put a lot of time searching the internet or reaching out people for the right direction, but unfortunately I have not yet found something promising so I have a lot of hope I would find some people here to save from a big headache.
Assume that we have a flow (semi static/dynamic flow) for our business process. We have different departments owning portions of data. we need to take different actions during the flow such as data entry, data validation, data exportation, approvals, rejections, notes etc and also automatically define deadlines, create reports of overdue tasks and people accountable for them etc.
I guess the data management part would not be extremely difficult, but how to write an application (codes) to run the flow (workflow engine) is where I struggle. Should I use triggers or should I choose to write codes to frequently run queries to push the completed steps to next step, how I can use SQL tables to keep the track of flow etc
If one could give me some hints on this matter, I would be greatly appreciated
I would suggest using the sql server integration services SSIS, you can easily mange the scripts and workflow based on some lookup selections, and also you can schedule SSIS package on timely bases to trigger and do the job.
It's hard task to implement application server on sql server. Also it's will be very vendor depended solution. Best way i think to use sql server as data storage and some application server for business logic over data storage.

Warehouse and SSIS

I develop some application that has database wery generic so really can't use it for reporting. So I need solution how to create reporting. I'm developer so my knowledge in DBA domain is bounded. For now I have ideo to create another database where I'll pu denormalized data from original db. So I saw that I could use SSIS for that and woul be glad if someone could give me some advice how to attack that problem. Should I sync data once a day and run reports that way. Is there solution to sync data allways so reports would be up to date? Please any advice.. Thanks!
Damir,
What I get from your message is that you are getting close to build a Datawarehouse using a Star Schema pattern.
You could have two databases, One with normalized data and the other one with the Star Schema pattern (Your DW), and then create a script that would use your normalized data and put them in your datawarehouse. For the frequency of your script it is up to you : After each transaction, every hour, once a day, etc...
The advantage of having a datawarehouse is that you will be able to use OLAP cubes and the MDX language for your reports. It's a plus !
Hope it could help,
If you are on sql server 2005 or greater, explore Merge statement.
For smaller tables, just truncate and reload. 'Smaller' could be subjective - but if takes less than 2-3 minutes to load, that could be termed as small. Obviously, during that period any query that uses such tables would fail.

Methods of maintaining sample data in a database

Firstly, let me apologize for the title, as it probably isn't as clear as I think it is.
What I'm looking for is a way to keep sample data in a database (SQL, 2005 2008 and Express) that get modified every so often. At present I have a handful of scripts to populate the database with a specific set of data, but every time the database is changed all the scripts have to be more or less rewritten and I was looking for some alternatives.
I've seen a number of tools and other software for creating sample data in a database, some free and some not. Are there any other methods I haven’t considered?
Thanks in advance for any input.
Edit: Also, if anyone has any advice at all in dealing with keeping data in sync with a changing application or database, that would be of some help as well.
If you are looking for tools for SQL server, go visit Red Gate Software, they have the best tools. They have a data compare tool that you can use to keep lookup type tables up-to-date and a SQL compare tool that you can use to keep the tables synched up between two datbases. So using SQL data compare, create a datbase with all the sample data you want. Then periodically refresh your testing db (or your prod db if these are strictly lookup type tables) using the compare tool.
I also like the alternative of having a script (you can use Red Gate's tool to create scripts) because that means you can store this info in your source control and use it as part of a deployment package to other servers.
You could save them in another database or the same db in different tables distinguished by the name, like employee_test
Joseph,
Do you need to keep just the data in sync, or the schema as well?
One solution to the data question would be SQL Server snapshots. You create a snapshot of your initial configuration, so any changes to the "real" database don't show up in the snapshot. Then, when you need to reset the table, select from the snapshot into a new table. I'm not sure how it will work if the schema changes, but it might be worth a try.
For generation of sample data, the Database project in Visual Studio has functionality that will create fake/random data.
Let me know if this make sense.
Erick

How to restore a database from different computers into one

I have 3 computers having the same sql server 2005 database, I would like to gather the data from the 3 computers to another computer which has the same database. Please help me.
This is called "data conversion" and a lot of your work will be to determine uniqueness on each one of them and coming up with strategies to prevent collisions, mainly primary keys that likely are the same across these databases. No simple answer here, it can be a project in itself.
It might be difficult without any manual data transformation. It depends on your database and type of the data. For example what do you use as a keys? If you have sequential integers as a primary/foreign keys, then you will have to do some manual data transformation. IF you use GUIDS, it will get slightly easier, but you still have to ensure that for example some lookup tables doesn't have different guid keys for same items etc.. But there is no took for doing this automatically.
Maybe if you have some very simple data without any relations to other tables (like table with one column with text messages etc) you can script the data with SQL Server Database Publishing Wizard, and then execute the scripts against your target database.
You need to backup your databases by right clicking in Enterprise Manager and choosing backup before choosing the location etc.
After backing up you can then restore to your local Sql Server by right clicking and choosing restore.
After you have the data locally you will need to write queries to transfer the data to your local database.
Alternatively you can use something like Red Gates Sql Data Compare to compare and transfer data using a visual interface. Although this costs money.
Redgate SQL Toolbelt may be able to help you. You would first copy database to that another computer and then compare it with Sql Data Compare against 3 databases always copying data only one way (to your new database). However I am not 100% sure if it will work like i think it would. You would have to verify it yourself.
Like other people suggested some things like primary keys etc may be problematic.