DB to DB mapping in regular intervals - sql

I need to create recurring job in which I have to map two SQL Server databases which are on two different servers. I need to check the data mismatch in both the tables in regular intervals because new data keeps on adding every second.
I am thing to use anyone of the ETL tool like kettle pentaho which will actually do the data mapping. Do we have any other better option to handle this scenario.

This seems a ETL job approach, as long as you're using SQL Server I would recommend you use SSIS, is the Microsoft ETL tool. Of course you can use Pentaho and I think it will work very good also.
Another approach would be use linked servers and a job, writing the script as a stored procedure, but in my opinion this is not a recommended way to address the problem (SSIS or any ETL tool is so much versatile).

Related

erasing all data and populate with dummy data

what is the best way to transfer database copy as backed up file for outside maintenance on the application? But that copy should not have any sensitive data and it can only have dummy data. What is the efficient and best practice to erase all data in the tables and populate with dummy data? ( sql server 2019)
This is not a trivial task. A 3rd party solution would probably be easiest.
There are several answers available here that discuss copying objects in SQL Server Management Studio. Example: Backup SQL Schema Only?.
If you have access to SQL Server Integration Services, you can copy selected objects using the Transfer SQL Server Objects Task. I have tried this once a long time ago, so I have very little experience to describe how it works.
Another option is to create a job that runs a copy-only backup, restores the database, and then runs a manual series of SQL queries to clear or mask sensitive data.

How to automatically push data from SQL Server to Oracle?

I have users entering data in SharePoint (Running on SQL Server), but my application to view that data will be an Oracle Apex app running on Oracle, obviously. How do I have the data be pushed into the Oracle db automatically?
First off, are you sure that you need to replicate the data to Oracle? Oracle Heterogeneous Services allows you to create a database link in Oracle that connects to a non-Oracle database using ODBC (assuming you use the Transparent Gateway for ODBC which is free). Your APEX application could then query and report on data that is in SQL Server by issuing queries that run over the database link. Tim Hall has a good article (though it's a bit dated and some of the components have been renamed, the general approach is still the same) on configuring Heterogeneous Services.
If you do need to replicate the data, you can create materialized views in Oracle that query the objects in SQL Server using the database link you created with Heterogeneous Services and schedule those materialized views to refresh on a regular basis. The materialized views will need to do a complete refresh, though, which means that every row will need to be copied from SQL Server to Oracle every time there is a refresh. That generally limits the frequency with which you can realistically have refreshes happen. If you need the data to be replicated to the Oracle database and you need to send incremental changes so that the Oracle side doesn't lag too far behind, you can use Streams from a non-Oracle database to an Oracle database but that involves a lot more work.
In SQL Server you can setup linked servers that allow you to view data from other db's. You might see if Oracle has something similar, if not the same. Alternatively, you could use the sql's integration services to push the data over to an oracle table. Unfortunately I only know how to setup linked servers in SQL Server and I don't have a lot of experience with ssis to tell you how to do that, but those are the first two options I can think of that you might explore further.
Here's a link I found that might be helpful as well: http://www.dba-oracle.com/t_connecting_sql_server_oracle.htm
There's no way to do it "automatically" that I know of that will work across DBMS. ETL tools like Sql Server Integration Services might help but there's going to be a loading delay (as it will have to poll for changes). You could build some update triggers on the SharePoint database tables but that's going to turn into a support nightmare.

What are ways to transfer tables from Oracle to SQL Server

I've been searching the internet for this question:
What are ways to transfer data and tables on a daily basis from an Oracle's Hyperion to SQL Server 2000?
I am an intern at a company and trying to figure out possible ways to do this. Any help or point in the right direction is greatly appreciated
This is going to depend a lot on specifics. Here are just a few possible solutions:
DTS
DTS is packaged with SQL 2000 and is made for this kind of a task. If written correctly, your DTS package can have good error-handling and be rerunnable/reusable.
SSIS
SSIS is actually packaged with SQL 2005 and above, but you can connect it to other databases. It's basically a better version of DTS. (technically it's radically different than DTS, but has a lot of the same functionality)
Linked Servers
From SQL 2000 you should be able to connect directly to your Oracle database as a linked server. In the pros column this kind of direct access can be easy to work with if you don't have any other technical skills such as DTS or SSIS, but it can be complex to get the initial set-up right and there may be security concerns/issues.
Build Your Own
Depending on what other technologies you use you can build your own application to do the ETL (Extract/Transform/Load, which is what you're doing). This could be in .NET, Java, etc. In the pros column you can use something with which you're familiar but there's a big downside here in that most of the low level type of work is already out there in tools like DTS/SSIS, so why reinvent the wheel?
BCP
You can simply extract the data from Oracle as .csv files (or some other format) and then import them back in using SQL Server's Bulk Copy Process. This can be fast, but there aren't many bells and whistles to go with this. If this is a one-time thing with just a few tables though then this is probably the easiest and fastest way to do it.
Third Party Applications
There are a slew of ETL applications already written out there (Data Import, Data Slave, etc.). They will usually provide wizards and one-click solutions (maybe a few more than one click), but they are also going to cost a bit of extra money.
EDIT:
Given your latest comment, I would probably go with a DTS package that's scheduled in SQL Agent to run daily. You can add in error-handling and have the system email/text/call someone if there's ever an issue (or do positive case reporting - ie. send a message when it's successful so that someone knows that there's a problem if they don't get a message each day.
In our company we use ADO.Net for the same task.
We created a source to Oracle , taking all data and then creating it in SQL server
You could write DTS packages to copy the data, and schedule them to run within Sql Server Agent.
See DTS Overview for information on DTS packages.
Here's a tutorial on creating a DTS package: Creating DTS Packages With SQL Server 2000
Oracle Hyperion is a suite of products, largely unrelated to Oracle's database product. I expect you are referring to a product such as Hyperion Financial Management or Hyperion Strategic Finance. These products have APIs that can be consumed using COM Interop or web services. The data can be extracted from the internal multidimensional database by analyzing the database metadata, creating dimension trees, and then using the information to create selections, that represent subcubes within the database; allowing you to get or set cell data.
I don't know what your level of knowledge of multidimensional databases is, but unless it is substantial you may find the task pretty hard. You also need to get a handle on the particular product API.
My company specializes in these kinds of activities, and we have components for this kind of thing. Drop me a line on my blog if you need further advice.
danielvaughan.org
Cheers,
Daniel
I don't know anything about Hyperion, but SQL Server 2000 is very old and may not have a driver to be able to pull data from Hyperion if the version of that is newer than the year 2000. You may need to look to see if there is a way to push the data from Hyperion rather than pull it into SQL Server 2000. One way i have done this is the past is to create pipe delimited text file from the data base that orginally has the data and palce it in a processing directory. I do know that DTS will process a pipe-delimited text file. So if you can't find a driver to process this data directly, consider if you can push it out to file and then process. You wil have to schedule a time gap between the job on Hyperion that creates the file and the DTS package job. But if you are only doing it once a day, that's prbably not a problme.

Warehouse and SSIS

I develop some application that has database wery generic so really can't use it for reporting. So I need solution how to create reporting. I'm developer so my knowledge in DBA domain is bounded. For now I have ideo to create another database where I'll pu denormalized data from original db. So I saw that I could use SSIS for that and woul be glad if someone could give me some advice how to attack that problem. Should I sync data once a day and run reports that way. Is there solution to sync data allways so reports would be up to date? Please any advice.. Thanks!
Damir,
What I get from your message is that you are getting close to build a Datawarehouse using a Star Schema pattern.
You could have two databases, One with normalized data and the other one with the Star Schema pattern (Your DW), and then create a script that would use your normalized data and put them in your datawarehouse. For the frequency of your script it is up to you : After each transaction, every hour, once a day, etc...
The advantage of having a datawarehouse is that you will be able to use OLAP cubes and the MDX language for your reports. It's a plus !
Hope it could help,
If you are on sql server 2005 or greater, explore Merge statement.
For smaller tables, just truncate and reload. 'Smaller' could be subjective - but if takes less than 2-3 minutes to load, that could be termed as small. Obviously, during that period any query that uses such tables would fail.

How do I keep a table synchronized with a query in SQL Server - ETL?

I wan't sure how to word this question so I'll try and explain. I have a third-party database on SQL Server 2005. I have another SQL Server 2008, which I want to "publish" some of the data in the third-party database too. This database I shall then use as the back-end for a portal and reporting services - it shall be the data warehouse.
On the destination server I want store the data in different table structures to that in the third-party db. Some tables I want to denormalize and there are lots of columns that aren't necessary. I'll also need to add additional fields to some of the tables which I'll need to update based on data stored in the same rows. For example, there are varchar fields that contain info I'll want to populate other columns with. All of this should cleanse the data and make it easier to report on.
I can write the query(s) to get all the info I want in a particular destination table. However, I want to be able to keep it up-to-date with the source on the other server. It doesn't have to be updated immediately (although that would be good) but I'd like for it be updated perhaps every 10 minutes. There are 100's of thousands of rows of data but the changes to the data and addition of new rows etc. isn't huge.
I've had a look around but I'm still not sure the best way to achieve this. As far as I can tell replication won't do what I need. I could manually write the t-sql to do the updates perhaps using the Merge statement and then schedule it as a job with sql server agent. I've also been having a look at SSIS and that looks to be geared at the ETL kind of thing.
I'm just not sure what to use to achieve this and I was hoping to get some advice on how one should go about doing this kind-of thing? Any suggestions would be greatly appreciated.
For that tables whose schemas/realtions are not changing, I would still strongly recommend Replication.
For the tables whose data and/or relations are changing significantly, then I would recommend that you develop a Service Broker implementation to handle that. The hi-level approach with service broker (SB) is:
Table-->Trigger-->SB.Service >====> SB.Queue-->StoredProc(activated)-->Table(s)
I would not recommend SSIS for this, unless you wanted to go to something like dialy exports/imports. It's fine for that kind of thing, but IMHO far too kludgey and cumbersome for either continuous or short-period incremental data distribution.
Nick, I have gone the SSIS route myself. I have jobs that run every 15 minutes that are based in SSIS and do the exact thing you are trying to do. We have a huge relational database and then we wanted to do complicated reporting on top of it using a product called Tableau. We quickly discovered that our relational model wasn't really so hot for that so I built a cube over it with SSAS and that cube is updated and processed every 15 minutes.
Yes SSIS does give the aura of being mainly for straight ETL jobs but I have found that it can be used for simple quick jobs like this as well.
I think, staging and partitioning will be too much for your case. I am implementing the same thing in SSIS now but with a frequency of 1 hour as I need to give some time for support activities. I am sure that using SSIS is a good way of doing it.
During the design, I had thought of another way to achieve custom replication, by customizing the Change Data Capture (CDC) process. This way you can get near real time replication, but is a tricky thing.