OpenQuery or Create Duplicate Database on 2 different servers? - sql

okay so I googled quite a bit and didn't find an answer, so here I am.
I have two servers, Lets say Server1 and Server2.
Server1 has 1 database S1.DB1.
Server 2 has 2 databases S2.DB1 and S2.DB2.
Now, I have to do reporting on Server 1 where I need to use data from S1.DB1 and S2.DB1. Should I be using OpenQuery to get the data from server 2 or should I just create a copy of S2.DB1 in Server 1 so I can easily access those tables in Server 1?
My basic concerns are speed of data extraction in case of OpenQuery and Storage issues in case of duplicating DBs. Which one is the right/better way to go about it?

Use OpenQuery if it's fast enough. If you have your DBA's cooperation, he could create a linked server, which is more convenient.
Replicating the data requires more administrative overhead and introduces failure modes. It can be helpful if the data changes slowly and are queried relatively frequently. (Use replication, though; don't just "copy the database".)

Related

Updating multiple tables data from different databases having same column name

I have 11 databases in which I'm having tables contains User Details i.e. all employee details. There I have a column "Status"(which is 1 for Active and 0 for Inactive). I have a regular tasks for updating "Status" column value 0 or 1 for mentioned employees and for that, I have to go through all the databases then User table then I have to update. The same task i have to do for all the database and it consumes a lot of time.
If I will get a short Query or Procedure that I have to run once and will do all updation at once then, it would be a great help.
I see a couple of possible options.
You could build an SSIS package to connect to each database and do the necessary updates provided the criteria of which employees to update and what to update them to could be found within the database or some external source such as a text file.
Alternatively, you could use SQLCMD mode in SQL Server Management Studio and then within your SQL script use CONNECT command to switch to each server and database something like this...
:CONNECT Server1
USE Database1
--put your update SQL script
:CONNECT Server2
USE Database2
--put your update SQL script
...
These links provide some further information on using SQLCMD mode...
Connecting to multiple servers in a Query Window using SQLCMD
SQL Server SQLCMD Basics
Noel
As you mentioned, you have 11 databases.
Problem : First, you are using very bad approach for database design,
What really Happens : When you are using multiple databases and you need to check in every database, then the server needs to connect to different database again and again, which takes very more time compared to switching into the tables, because of connection handling.
Solution : In your case, you have only one option to connect different databases in loops and then run the query in the loop for every DB.
Suggestion : you should keep all the data in the same database, you can use an extra column in tables to keep track your data to different entities.

sql temp table join between servers

So I have a summary i need to return to the end user application.
It should accept 3 parameters DateType, StartDate, EndDate.
Date Type will determine the date field I use to filter the data.
The way i accomplished this was putting all the IDs of the records for a datetype into a TEMP table and then joining my summary to the list of IDs.
This worked fine when running on the query on the SQL server that houses the data.
However, that is a replicated server, so when I compiled to a stored proc that would be on the server with the rest of the application data, it slowed the query down. IE 2 seconds vs 50 seconds.
I think the cross join from the temp table that is created on the SQL server then joining to the tables on the replciation server, is causing the slow down.
Are there any methods or techniques that I can use to get around this and build this all in one stored procedure?
If I create 3 stored procedures with their own date range, then they are fast again. However, this means maintaining multiple stored procs for the same thing.
First off, if you are running a version of SQL Server older than 2012 SP1, one problem is that users who aren't allowed to run DBCC SHOW_STATISTICS (which is most users who aren't sysadmins, see the "Permissions" section in the documentation) don't get access to statistics on remote tables. This can severely cripple the optimizer's ability to generate a good execution plan. Upgrading SQL Server or granting more permissions can help there.
If your query involves filtering or joining on a character column, make sure the remote server is flagged in the linked server options as "collation compatible". If this option is off, SQL Server can't assume strings can be compared across the servers and it will start pumping entire tables up and down just to make sure the data ends up where the comparison has to be made.
If the execution plan is as good as it gets and it's still not good enough, one general (lame) technique is to transfer all data locally first (SELECT * INTO #localtable FROM remote.db.schema.table), then run the query as a non-distributed query. Obviously, in order for this to work, the remote table cannot be "too big" and in some cases this actually has worse performance, depending on how many rows are involved. But it's always worth considering, because the optimizer does a better job with local tables.
Another approach that avoids pulling tables together across servers is packing up data in parameters to remote stored procedure calls. Entire tables can be passed as XML through an NVARCHAR(MAX), since neither XML columns nor table-valued parameters are supported in distributed queries. The basic idea is the same: avoid the need for the the optimizer to figure out an efficient distributed query. The best approach greatly depends on your data and your query, obviously.

How to query a table to a view and publish to a different database

I have 13 SQL databases some 2005 others 2008, on a VPN. I'd like to take all of the data from the "Employees" table on each database and make it a view at each location. I would then like to publish these views to 1 database on another server, all in one table marking where each came from within the origninal databases. For example the database where all the information goes to would look like this:
User Name Location
bik Bob K 1
JS John S 2
Etc.
Any help is appreciated.
I assume you want the data on the final server to be viewable, but not modifiable, and to reflect changes made to the source databases?
This would probably not perform all that well, but one do-it-yourself-way to do it would be the following (disclaimer: I haven't tried doing this myself):
Set up all the source servers as linked servers on the final server.
Create a view in this form:
SELECT *, 1 as Location
FROM [Linked Server 1].Database1.dbo.Table1
UNION ALL
SELECT *, 2 as Location
FROM [Linked Server 2].Database2.dbo.Table2
... etc ....
You might want to read this documentation on distributed queries, if you haven't already.
I believe it's also possible to use SSIS as the source of a distributed query, but a quick scan through the documentation didn't find anything about it. I mention that because SSIS would make pulling and transforming data from disaparate data sources very easy, and if you could use the final recordset as a data source, you could use an SSIS package as the backend to your view. However, again, performance would probably require considerable tuning.
If the queries don't have to be real time you could look into using SQL Server Integration Services (SSIS) to pull in the data to a local DB. you could schedule the job to run hourly/daily/weekly..

SQL Server 2008, Sybase - large select queries over low bandwidth

I need to pull a large amount of data from various tables across a line that has very low bandwidth. I need to minimize the amount of data that gets sent too and fro.
On that side is a Sybase database, on this side SQL Server 2008.
What I need is to pull all the tables from the Sybase database that have to do with this office. Lets say I have the following tables as an example:
Farm
Tree
Branch
etc.
(one farm has many trees, one tree has many branches etc.)
Lets say the "Farm" table has a field called "CountryID", and I only want the data for where CountryID=12. The actual table structures I am looking at are very complex (and I am also not very familiar with them) so I want to try to keep the queries simple.
So I am thinking of setting up a series of views:
CREATE VIEW vw_Farm AS
SELECT * from Farm where CountryID=12
CREATE VIEW vw_Tree AS
SELECT * from Tree where FarmID in (SELECT FarmID FROM vw_Farm)
CREATE VIEW vw_Branch AS
SELECT * from Tree where BranchID in (SELECT BranchID FROM vw_Branch)
etc.
To then pull the actual data across I would then do:
SELECT * from vw_Farm into localDb.Farm
SELECT * from vw_Tree into localDb.Tree
SELECT * from vw_Branch into localDb.Branch
etc.
Simple enough to set up. I am wondering how this will perform though? Will it perform all the SELECT statements on the Sybase side and then just send back the result? Also, since this will be an iterative process, is it possible to index the views for subsequent calls?
Any other optimisation suggestions would also be welcome!
Thanks
Karl
EDIT: Just to clarify, the views will be set up in SQL Server. I am using a linked server using Sybase ASE to set up those views. What is worrying me in particular is whether the fact that the view is in SQL Server on this side and not on Sybase on that side will mean that for each iteration the data from the preceeing view will get pulled across to SQL Server first before the calculations get executed. I want Sybase to do all the calcs and just pass the results across.
It's difficult to be certain without testing, but my somewhat-relevant experience (using linked servers to platforms other than Sybase, and on SQL Server 2005) has been that using subqueries (such as your code for vw_Tree and vw_Branch) more or less guarantees that SQL Server will pull all the data for the outer table into a local temp table, then match it to the results of the inner query.
The problem is that SQL Server has no access to the linked server's table statistics, so can make no meaningful decisions about how to optimise the query.
If you want to be sure to have the work done on the Sybase server, your best bet will be to write code (could be views or stored procedures) on the Sybase side and reference them from SQL Server.
Linked server connections are, in my experience, not particularly resilient over flaky networks. If it's available, you could consider using Integration Services rather than linked-server queries - but even that may not be much better. You may need to consider falling back on moving text files with robocopy and bcp.

SQL Server: is it possible to get data from another SQL server without setting linked server?

I need to do the following query (for example):
SELECT c1.CustomerName FROM Customer as c1
INNER JOIN [ExternalServer].[Database].[dbo].[Customer] as c2
ON c2.RefId = c1.RefId
For some security reason my client doesn't allow me to create a linked server. The user under whom I execute this query has access to both tables. Is it possible to make it work without using linked server? Thanks.
You could use OPENROWSET, which'll require the connection info, username & password...
While I understand that the client believes that having an always-on connection to their data is risky, that's why you lock down the account. OPENROWSET means including the connection info in plain text.
'Linked Server' is a very specific thing -- basically, a permanent connection between servers. I can think of all sorts of reasons not to want that, while at the same time having no problem with folks writing queries that combine data from the two different data sources.
Anyway, depending on your requirement -- if this is just for ad hoc querying, OPENROWSET is good if inside of SQL-Server, or if you want to do this in MS Access, just link to the two tables, and your Access query won't care that one comes from one server, and one comes from another.
Alternatively, with a web or windows front-end, you could indpendently query each table into a data object, and then build a separate query on top of that.
Http Endpoints...
WebServices...
There's a million ways. I wouldn't be so quick to assume, as #Lasse suggests, that any form of 'linking' this data together would make you some kind of rougue data linker.