I am writing a set of stored procedures that aggregate data from large datasets.
The main of stored procedure makes a call to another server(s) where the data is located. The data is calculated in steps and stored in multiple temp tables (currently global temp tables) and then pulled to the server I'm sitting on (this is done because of the way the linked servers are setup).
Right now I'm trying to write dynamic SQL to create temp tables with a unique identifier because multiple people may run the stored procedures at the same time. However because of the number of sub-steps to this process its getting complex so I'm wondering if I'm over thinking it.
My question is if I simplify and just use local temp tables will I run into problems because the tables will have the same name? NOTE: Users may have same login user names.
Temp table names are per-session. When you call SqlConnection.Open you get a new session. Normally, applications do not share sessions between HTTP requests. Neither is this a common thing nor is this a good thing.
I don't believe you have a problem. If you get name clashes then you should fix the application to not share sessions in the first place.
Related
I have an old ERP, which handled many organizations in a single db with a suffix on the tables. Lets say the db name is ORG and the tables look like CLIENTS01, CLIENTS02, CLIENTS03.
The new ERP connects to one database per organization.
The sad thing is I need to use both ERPs, so I need to replicate SAME DB WITH MULTIPLE NAMES: ORG01, ORG02, ORG03.
I was thinking for mirroring but this will increase database workload.
I looked for aliases but didn't find the way to make it work on the same instance.
I tried to create new database and then attach same file MDF file without success.
What would be the simplest way to just create 3 new databases connecting to the same database (1 database, multiple names)?
Requirement :
Transfer millions of records from source (SQL Server) to destination (SQL Server).
Structure of source tables is different from destination tables.
Refresh data once per week in destination server.
Minimum amount of time for the processing.
I am looking for optimized approach using SSIS.
Was thinking these options :
Create Sql dump from source server and import that dump in destination server.
Directly copy the tables from source server to destination server.
Lots of issues to consider here. Such as are the servers in the same domain, on same network, etc.
Most of the time you will not want to move the data as a single large chunk of millions of records but in smaller amounts. An SSIS package handles that logic for you, but you can always recreate it as well but iterating the changes easier. Sometimes this is a reason to push changes more often rather than wait an entire week as smaller syncs are easier to manage with less downtime.
Another consideration is to be sure you understand your delta's and to ensure that you have ALL of the changes. For this reason I would generally suggest using a staging table at the destination server. By moving changes to staging and then loading to the final table you can more easily ensure that changes are applied correctly. Think of the scenario of a an increment being out of order (identity insert), datetime ordered incorrectly or 1 chunk failing. When using a staging table you don't have to rely solely on the id/date and can actually do joins on primary keys to look for changes.
Linked Servers proposed by Alex K. can be a great fit, but you will need to pay close attention to a couple of things. Always do it from Destination server so that it is a PULL not a push. Linked servers are fast at querying the data but horrible at updating/inserting in bulk. 1 XML column cannot be in the table at all. You may need to set some specific properties for distributed transactions.
I have done this task both ways and I would say that SSIS does give a bit of advantage over Linked Server just because of its robust error handling, threading logic, and ability to use different adapters (OLEDB, ODBC, etc. they have different performance do a search and you will find some results). But the key to your #4 is to do it in smaller chunks and from a staging table and if you can do it more often it is less likely to have an impact. E.g. daily means it would already be ~1/7th of the size as weekly assuming even daily distribution of changes.
Take 10,000,000 records changed a week.
Once weekly = 10mill
once daily = 1.4 mill
Once hourly = 59K records
Once Every 5 minutes = less than 5K records
And if it has to be once a week. just think about still doing it in small chunks so that each insert will have more minimal affect on your transaction logs, actual lock time on production table etc. Be sure that you never allow loading of a partially staged/transferred data otherwise identifying delta's could get messed up and you could end up missing changes/etc.
One other thought if this is a scenario like a reporting instance and you have enough server resources. You could bring over your entire table from production into a staging or update a copy of the table at destination and then simply do a drop of current table and rename the staging table. This is an extreme scenario and not one I generally like but it is possible and actual impact to the user would be very nominal.
I think SSIS is good at transfer data, my approach here:
1. Create a package with one Data Flow Task to transfer data. If the structure of two tables is different then it's okay, just map them.
2. Create a SQL Server Agent job to run your package every weekend
Also, feature Track Data Changes (SQL Server) is also good to take a look. You can config when you want to sync data and it's good at performance too
With SQL Server versions >2005, it has been my experience that a dump to a file with an export is equal to or slower than transferring data directly from table to table with SSIS.
That said, and in addition to the excellent points #Matt makes, this the usual pattern I follow for this sort of transfer.
Create a set of tables in your destination database that have the same table schemas as the tables in your source system.
I typically put these into their own database schema so their purpose is clear.
I also typically use the SSIS OLE DB Destination package's "New" button to create the tables.
Mind the square brackets on [Schema].[TableName] when editing the CREATE TABLE statement it provides.
Use SSIS Data Flow tasks to pull the data from the source to the replica tables in the destination.
This can be one package or many, depending on how many tables you're pulling over.
Create stored procedures in your destination database to transform the data into the shape it needs to be in the final tables.
Using SSIS data transformations is, almost without exception, less efficient than using server side SQL processing.
Use SSIS Execute SQL tasks to call the stored procedures.
Use parallel processing via Sequence Containers where possible to save time.
This can be one package or many, depending on how many tables you're transforming.
(Optional) If the transformations are complex, requiring intermediate data sets, you may want to create a separate Staging database schema for this step.
You will have to decide whether you want to use the stored procedures to land the data in your ultimate destination tables, or if you want to have the procedures write to intermediate tables, and then move the transformed data directly into the final tables. Using intermediate tables minimizes down time on the final tables, but if your transformations are simple or very fast, this may not be an issue for you.
If you use intermediate tables, you will need a package or packages to manage the final data load into the destination tables.
Depending on the number of packages all of this takes, you may want to create a Master SSIS package that will call the extraction package(s), then the transformation package(s), and then, if you use intermediate processing tables, the final load package(s).
I understand the difference between Internal tables and external tables in hive as below
1) if we drop the internal Table File and metadata will be deleted, however , in case of External only metadata will be
deleted
2) if the file data need to be shared by other tools/applications then we go for external table if not
internal table, so that if we drop the table(external) data will still be available for other tools/applications
I have gone through the answers for question "Difference between Hive internal tables and external tables? "
but still I am not clear about the proper uses cases for Internal Table
so my question is why is that I need to make an Internal table ? why cant I make everything as External table?
Use EXTERNAL tables when:
The data is also used outside of Hive.
For example, the data files are read and processed by an existing program that doesn't lock the files.
The data is permanent i.e used when needed.
Use INTERNAL tables when:
The data is temporary.
You want Hive to completely manage the lifecycle of the table and data.
Let's understand it with two simple scenarios:
Suppose you have a data set, and you have to perform some analytics/problem statements on it. Because of the nature of problem statements, few of them can be done by HiveQL, few of them need Pig Latin and few of them need Map Reduce etc., to get the job done. In this situation External Table comes into picture- the same data set can be used to solve entire analytics instead of having different different copies of same data set for the different different tools. Here Hive don't need authority on the data set because several tools are going to use it.
There can be a scenario, where entire analytics/problem statements can be solved by only HiveQL. In such situation Internal Table comes into picture- Means you can put the entire data set into Hive's Warehouse and Hive is going to have complete authority on the data set.
Server1: Prod, hosting DB1
Server2: Dev hosting DB2
Is there a way to query databases living on 2 different server with a same select query? I need to bring all the new rows from Prod to dev, using a query
like below. I will be using SQL Server DTS (import export data utility)to do this thing.
Insert into Dev.db1.table1
Select *
from Prod.db1.table1
where table1.PK not in (Select table1.PK from Dev.db1.table1)
Creating a linked server is the only approach that I am aware of for this to occur. If you are simply trying to add all new rows from prod to dev then why not just create a backup of that one particular table and pull it into the dev environment then write the query from the same server and database?
Granted this is a one time use and a pain for re-occuring instances but if it is a one time thing then I would recommend doing that. Otherwise make a linked server between the two.
To backup a single table in SQL use the SQl Server import and export wizard. Select the prod database as your datasource and then select only the prod table as your source table and make a new table in the dev environment for your destination table.
This should get you what you are looking for.
You say you're using DTS; the modern equivalent would be SSIS.
Typically you'd use a data flow task in an SSIS package to pull all the information from the live system into a staging table on the target, then load it from there. This is a pretty standard operation when data warehousing.
There are plenty of different approaches to save you copying all the data across (e.g. use a timestamp, use rowversion, use Change Data Capture, make use of the fact your primary key only ever gets bigger, etc. etc.) Or you could just do what you want with a lookup flow directly in SSIS...
The best approach will depend on many things: how much data you've got, what data transfer speed you have between the servers, your key types, etc.
When your servers are all in one Active Directory, and when you use Windows Authentification, then all you need is an account which has proper rights on all the databases!
You can then simply reference all tables like server.database.schema.table
For example:
insert into server1.db1.dbo.tblData1 (...)
select ... from server2.db2.dbo.tblData2;
We have three database on same server (dev, test and uat). I am using a fourth database to perform some operations. I have views and stored proc created which utilizes the dev db. When I want to promote the code, I need to change the db name in all views and stored proc. Is there a better way of doing this? We are constrained with single server for all three environment.
Thanks
shankara Narayanan
Always script everything. Then you have a nice .SQL file that you can manipulate in whatever way is necessary. I prefer to set the all up with DROP/CREATE pairs for every view, SP and function. If any of them need to change, i update the script and rerun the whole thing.
I usually use a separate script file for the tables.