Implement symmetric difference in SQL Server? - sql

Here's a problem I've been trying to solve at work. I'm not a database expert, so that perhaps this is a bit sophomoric. All apologies.
I have a given database D, which has been duplicated on another machine (in a perhaps dubious manner), resulting in database D'. It is my task to check that database D and D' are in fact exactly identical.
The problem, of course, is what to actually do if they are not. For this purpose, my thought was to run a symmetric difference on each corresponding table and see the differences.
There is a "large" number of tables, so I do not wish to run each symmetric difference by hand. How do I then implement a symmetric difference "function" (or stored procedure, or whatever you'd like) that can run on arbitrary tables without having to explicitly enumerate the columns?
This is running on Windows, and your hedge fund will explode if you don't follow through. Good luck.

Here is the solution. The example data is from the ReportServer database that comes with SSRS 2008 R2, but you can use it on any dataset:
SELECT s.name, s.type
FROM
(
SELECT s1.name, s1.type
FROM syscolumns s1
WHERE object_name(s1.id) = 'executionlog2'
UNION ALL
SELECT s2.name, s2.type
FROM syscolumns s2
WHERE object_name(s2.id) = 'executionlog3'
) AS s
GROUP BY s.name, s.type
HAVING COUNT(s.name) = 1

You can achieve this by doing something like this.
I have used a function to split comma separated value into a table to demostrate.
CREATE FUNCTION [dbo].[Split]
(
#RowData nvarchar(2000),
#SplitOn nvarchar(5)
)
RETURNS #RtnValue table
(
Id int identity(1,1),
Data nvarchar(100)
)
AS
BEGIN
Declare #Cnt int
Set #Cnt = 1
While (Charindex(#SplitOn,#RowData)>0)
Begin
Insert Into #RtnValue (data)
Select
Data = ltrim(rtrim(Substring(#RowData,1,Charindex(#SplitOn,#RowData)-1)))
Set #RowData = Substring(#RowData,Charindex(#SplitOn,#RowData)+1,len(#RowData))
Set #Cnt = #Cnt + 1
End
Insert Into #RtnValue (data)
Select Data = ltrim(rtrim(#RowData))
Return
END
GO
DECLARE #WB_LIST varchar(1024) = '123,125,764,256,157';
DECLARE #WB_LIST_IN_DB varchar(1024) = '123,125,795,256,157,789';
DECLARE #TABLE_UPDATE_LIST_IN_DB TABLE ( id varchar(20));
DECLARE #TABLE_UPDATE_LIST TABLE ( id varchar(20));
INSERT INTO #TABLE_UPDATE_LIST
SELECT data FROM dbo.Split(#WB_LIST,',');
INSERT INTO #TABLE_UPDATE_LIST_IN_DB
SELECT data FROM dbo.Split(#LIST_IN_DB,',');
SELECT * FROM #TABLE_UPDATE_LIST
EXCEPT
SELECT * FROM #TABLE_UPDATE_LIST_IN_DB
UNION
SELECT * FROM #TABLE_UPDATE_LIST_IN_DB
EXCEPT
SELECT * FROM #TABLE_UPDATE_LIST;

My first reaction is to suggest duplicating to the other machine again in a non-dubious manner.
If that is not an option, perhaps some of the tools available from Red Gate could do what you need.
(I am in no way affliated with Red Gate, just remember Joel mentioning how good their tools were on the podcast.)

SQL Server 2000 Added the "EXCEPT" keyword, which is almost exactly the same as Oracle's "minus"
SELECT * FROM TBL_A WHERE ...
EXCEPT
SELECT * FROM TBL_B WHERE ...

Use the SQL Compare tools by Red Gate. It compares scheamas, and the SQL Data Compare tool compares data. I think that you can get a free trial for them, but you might as well buy them if this is a recurring problem. There may be open source or free tools like this, but you might as well just get this one.

Related

Select all record from all the tables, every derived table must have its own alias

I'm working on a e-learning project in which there is a table named chapter in which there is a column named question_table this is table in which the specific chapter's questions are added.
Now the problem is I want to display all the question from all the chapter for this I used following sql query
SELECT * FROM (SELECT `question_table` FROM `chapter`)
but it doesn't work and gives the error:
"Every derived table must have its own alias".
Note: I want to do it using SQL not PHP.
Firstly, I think you would be better redesigning your database. Multiple tables of the same structure holding the same data are generally not a good idea.
However what you require is possible using a MySQL procedure to build up some dynamic SQL and then execute it, returning the resulting data.
A procedure as follows could be used to do this:-
DROP PROCEDURE IF EXISTS dynamic;
delimiter //
CREATE PROCEDURE dynamic()
BEGIN
DECLARE question_table_value VARCHAR(25);
DECLARE b INT DEFAULT 0;
DECLARE c TEXT DEFAULT '';
DECLARE cur1 CURSOR FOR SELECT `question_table` FROM `chapter`;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET b = 1;
OPEN cur1;
SET b = 0;
WHILE b = 0 DO
FETCH cur1 INTO question_table_value;
IF b = 0 THEN
IF c = '' THEN
SET c = CONCAT('SELECT * FROM `',question_table_value, '`');
ELSE
SET c = CONCAT(c, ' UNION SELECT * FROM `',question_table_value, '`');
END IF;
END IF;
END WHILE;
CLOSE cur1;
SET #stmt1 := c;
PREPARE stmt FROM #stmt1;
EXECUTE stmt;
END
This is creating a procedure called dynamic. This takes no parameters. It sets up a cursor to read the question_table column values from the chapter table. It looks around the results from that, building up a string which contains the SQL, which is a SELECT from each table with the results UNIONed together. This is then PREPAREd and executed. The procedure will return the result set from the SQL executed by default.
You can call this to return the results using:-
CALL dynamic()
Down side is that this isn't going to give nice results if there are no rows to return and they are not that easy to maintain or debug with the normal tools developers have. Added to which very few people have any real stored procedure skills to maintain it in future.
In MySQL you must give every subquery ("derived table") an alias:
SELECT * FROM (SELECT question_table FROM chapter) t --notice the alias "t"
The derived table here is the result of the (SELECT ...). You need to give it an alias, like so:
SELECT * FROM (SELECT question_table FROM chapter) X;
Edit, re dynamic tables
If you know all the tables in advance, you can union them, i.e.:
SELECT * FROM
(
SELECT Col1, Col2, ...
FROM Chapter1
UNION
SELECT Col1, Col2, ...
FROM Chapter2
UNION
...
) X;
SqlFiddle here
To do this solution generically, you'll need to use dynamic sql to achieve your goal.
In general however, this is indicative of a smell in your table design - your chapter data should really be in one table, and e.g. classified by the chapter id.
If you do need to shard data for scale or performance reasons, the typical mechanism for doing this is to span multiple databases, not tables in the same database. MySql can handle large numbers of rows per table, and performance won't be an issue if the table is indexed appropriately.

How do I declare strings of data

I have a long code that is creating financial accounting data.
The code uses multiple unions to breakout data to different company groupings.
There are 5-6 account groupings that are reference multiple times.
Anytime there is a change to the groupings I have to go through the code and change it in each location.
An example of the string is below:
Where account in ('81000', '82000','87000','83600','67000')
and account like '814%'
Is there anyway to put this in a declare or just internally link to that code in other where statements?
There are several ways to do what you describe, which is best will depend on your exact needs.
First and simplest is to use variables.
declare #account1 int; set #account1 = 81000;
declare #account2 int; set #account2 = 82000;
declare #account3 int; set #account3 = 87000; /* and so forth*/
It's not clear from your question whether this is being called from a front end app, if it is, you can use sql parameters to set the account values.
string cmd =' declare #account1 int; set #account1 = #acount1In;
select columnslist from accounts where account in (#account1)
union
select columnslist from accounts where account in (#account1)
';
Secondly, you could put the values either into a temporary table or table variable.
declare #accountIds table (account int);
insert into #accountIds values(81000);
select columnlist from accounts where account in (select account from #accounts);
Finally, if this is really the same expression done multiple times, you might consider using a common table expression.
;using cte as (select columnlist from accounts where account in (81000, 87000)
)
select columnlist from cte inner join table2 on a=b
union
select columnlist from cte inner join table3 on a=c

Iterate through rows in SQL Server 2008

Consider the table SAMPLE:
id integer
name nvarchar(10)
There is a stored proc called myproc. It takes only one paramater ( which is id)
Given a name as parameter, find all rows with the name = #nameparameter and pass all those ids
to myproc
eg:
sample->
1 mark
2 mark
3 stu
41 mark
When mark is passed, 1 ,2 and 41 are to be passed to myproc individually.
i.e. the following should happen:
execute myproc 1
execute myproc 2
execute myproc 41
I can't touch myproc nor see its content. I just have to pass the values to it.
If you must iterate(*), use the construct designed to do it - the cursor. Much maligned, but if it most clearly expresses your intentions, I say use it:
DECLARE #ID int
DECLARE IDs CURSOR LOCAL FOR select ID from SAMPLE where Name = #NameParameter
OPEN IDs
FETCH NEXT FROM IDs into #ID
WHILE ##FETCH_STATUS = 0
BEGIN
exec myproc #ID
FETCH NEXT FROM IDs into #ID
END
CLOSE IDs
DEALLOCATE IDs
(*) This answer has received a few upvotes recently, but I feel I ought to incorporate my original comment here also, and add some general advice:
In SQL, you should generally seek a set-based solution. The entire language is oriented around set-based solutions, and (in turn) the optimizer is oriented around making set-based solutions work well. In further turn, the tools that we have available for tuning the optimizer is also set-oriented - e.g. applying indexes to tables.
There are a few situations where iteration is the best approach. These are few are far between, and may be likened to Jackson's rules on optimization - don't do it - and (for experts only) don't do it yet.
You're far better served to first try to formulate what you want in terms of the set of all rows to be affected - what is the overall change to be achieved? - and then try to formulate a query that encapsulates that goal. Only if the query produced by doing so is not performing adequately (or there's some other component that is unable to do anything other than deal with each row individually) should you consider iteration.
I just declare the temporary table #sample and insert the all rows which have the name='rahul' and also take the status column to check that the row is iterated.and using while loop i iterate through the all rows of temporary table #sample which have all the ids of name='rahul'
use dumme
Declare #Name nvarchar(50)
set #Name='Rahul'
DECLARE #sample table (
ID int,
Status varchar(500)
)
insert into #sample (ID,status) select ID,0 from sample where sample=#name
while ((select count(Id) from #sample where status=0 )>0)
begin
select top 1 Id from #sample where status=0 order by Id
update #sample set status=1 where Id=(select top 1 Id from #sample where status=0 order by Id)
end
Declare #retStr varchar(100)
select #retStr = COALESCE(#retStr, '') + sample.ID + ', '
from sample
WHERE sample.Name = #nameparameter
select #retStr = ltrim(rtrim(substring(#retStr , 1, len(#retStr )- 1)))
Return ISNULL(#retStr ,'')

SQL Search using case or if

Everyone has been a super help so far. My next question is what is the best way for me to approach this... If I have 7 fields that a user can search what is the best way to conduct this search, They can have any combination of the 7 fields so that is 7! or 5040 Combinations which is impossible to code that many. So how do I account for when the User selects field 1 and field 3 or they select field 1, field 2, and field 7? Is there any easy to do this with SQL? I dont know if I should approach this using an IF statement or go towards a CASE in the select statement. Or should I go a complete different direction? Well if anyone has any helpful pointers I would greatly appreciate it.
Thank You
You'll probably want to look into using dynamic SQL for this. See: Dynamic Search Conditions in T-SQL and Catch-all queries for good articles on this topic.
Select f1,f2 from table where f1 like '%val%' or f2 like '%val%'
You could write a stored procedure that accepts each parameter as null and then write your WHERE clause like:
WHERE (field1 = #param1 or #param1 is null)
AND (field2 = #param2 or #param2 is null) etc...
But I wouldn't recommend it. It can definitely affect performance doing it this way depending on the number of parameters you have. I second Joe Stefanelli's answer with looking into dynamic SQL in this case.
Depends on:
how your data looks like,
how big they are,
how exact result is expected (all matching records or top 100 is enough),
how much resources has you database.
you can try something like:
CREATE PROC dbo.Search(
#param1 INT = NULL,
#param2 VARCHAR(3) = NULL
)
AS
BEGIN
SET NOCOUNT ON
-- create temporary table to keep keys (primary) of matching records from searched table
CREATE TABLE #results (k INT)
INSERT INTO
#results(k)
SELECT -- you can use TOP here to norrow results
key
FROM
table
-- you can use WHERE if there are some default conditions
PRINT ##ROWCOUNT
-- if #param1 is set filter #result
IF #param1 IS NOT NULL BEGIN
PRINT '#param1'
;WITH d AS (
SELECT
key
FROM
table
WHERE
param1 <> #param1
)
DELETE FROM
#results
WHERE
k = key
PRINT ##ROWCOUNT
END
-- if #param2 is set filter #result
IF #param2 IS NOT NULL BEGIN
PRINT '#param2'
;WITH d AS (
SELECT
key
FROM
table
WHERE
param2 <> #param2
)
DELETE FROM
#results
WHERE
k = key
PRINT ##ROWCOUNT
END
-- returns what left in #results table
SELECT
table.* -- or better only columns you need
FROM
#results r
JOIN
table
ON
table.key = r.k
END
I use this technique on large database (millions of records, but running on large server) to filter data from some predefined data. And it works pretty well.
However I don't need all matching records -- depends on query 10-3000 matching records is enough.
If you are using a stored procedure you can use this method:
CREATE PROCEDURE dbo.foo
#param1 VARCHAR(32) = NULL,
#param2 INT = NULL
AS
BEGIN
SET NOCOUNT ON
SELECT * FROM MyTable as t
WHERE (#param1 IS NULL OR t.Column1 = #param1)
AND (#param2 IS NULL OR t.COlumn2 = #param2)
END
GO
These are usually called optional parameters. The idea is that if you don't pass one in it gets the default value (null) and that section of your where clause always returns true.

Handling the data in an IN clause, with SQL parameters?

We all know that prepared statements are one of the best way of fending of SQL injection attacks. What is the best way of creating a prepared statement with an "IN" clause. Is there an easy way to do this with an unspecified number of values? Take the following query for example.
SELECT ID,Column1,Column2 FROM MyTable WHERE ID IN (1,2,3)
Currently I'm using a loop over my possible values to build up a string such as.
SELECT ID,Column1,Column2 FROM MyTable WHERE ID IN (#IDVAL_1,#IDVAL_2,#IDVAL_3)
Is it possible to use just pass an array as the value of the query paramter and use a query as follows?
SELECT ID,Column1,Column2 FROM MyTable WHERE ID IN (#IDArray)
In case it's important I'm working with SQL Server 2000, in VB.Net
Here you go - first create the following function...
Create Function [dbo].[SeparateValues]
(
#data VARCHAR(MAX),
#delimiter VARCHAR(10)
)
RETURNS #tbldata TABLE(col VARCHAR(10))
As
Begin
DECLARE #pos INT
DECLARE #prevpos INT
SET #pos = 1
SET #prevpos = 0
WHILE #pos > 0
BEGIN
SET #pos = CHARINDEX(#delimiter, #data, #prevpos+1)
if #pos > 0
INSERT INTO #tbldata(col) VALUES(LTRIM(RTRIM(SUBSTRING(#data, #prevpos+1, #pos-#prevpos-1))))
else
INSERT INTO #tbldata(col) VALUES(LTRIM(RTRIM(SUBSTRING(#data, #prevpos+1, len(#data)-#prevpos))))
SET #prevpos = #pos
End
RETURN
END
then use the following...
Declare #CommaSeparated varchar(50)
Set #CommaSeparated = '112,112,122'
SELECT ID,Column1,Column2 FROM MyTable WHERE ID IN (select col FROM [SeparateValues](#CommaSeparated, ','))
I think sql server 2008 will allow table functions.
UPDATE
You'll squeeze some extra performance using the following syntax...
SELECT ID,Column1,Column2 FROM MyTable
Cross Apply [SeparateValues](#CommaSeparated, ',') s
Where MyTable.id = s.col
Because the previous syntax causes SQL Server to run an extra "Sort" command using the "IN" clause. Plus - in my opinion it looks nicer :D!
If you would like to pass an array, you will need a function in sql that can turn that array into a sub-select.
These functions are very common, and most home grown systems take advantage of them.
Most commercial, or rather professional ORM's do ins by doing a bunch of variables, so if you have that working, I think that is the standard method.
You could create a temporary table TempTable with a single column VALUE and insert all IDs. Then you could do it with a subselect:
SELECT ID,Column1,Column2 FROM MyTable WHERE ID IN (SELECT VALUE FROM TempTable)
Go with the solution posted by digiguru. It's a great reusable solution and we use the same technique as well. New team members love it, as it saves time and keeps our stored procedures consistent. The solution also works well with SQL Reports, as the parameters passed to stored procedures to create the recordsets pass in varchar(8000). You just hook it up and go.
In SQL Server 2008, they finally got around to addressing this classic problem by adding a new "table" datatype. Apparently, that lets you pass in an array of values, which can be used in a sub-select to accomplish the same as an IN statement.
If you're using SQL Server 2008, then you might look into that.
Here's one technique I use
ALTER Procedure GetProductsBySearchString
#SearchString varchar(1000),
as
set nocount on
declare #sqlstring varchar(6000)
select #sqlstring = 'set nocount on
select a.productid, count(a.productid) as SumOf, sum(a.relevence) as CountOf
from productkeywords a
where rtrim(ltrim(a.term)) in (''' + Replace(#SearchString,' ', ''',''') + ''')
group by a.productid order by SumOf desc, CountOf desc'
exec(#sqlstring)