Count number of rows across multiple tables in one query - sql

I have a SQL Server 2005 database that stores data for multiple users. Each table that contains user-owned data has a column called OwnerID that identifies the owner; most but not all tables have this column.
I want to be able to count number of rows 'owned' by a user in each table. In other words, I want a query that returns the names of each table that contains an OwnerID column, and counts the number of rows in each table that match a given OwnerID value.
I can return just the names of the matching tables using this query:
SELECT OBJECT_NAME(object_id) [Table] FROM sys.columns
WHERE name = 'OwnerID' ORDER BY OBJECT_NAME(object_id);
That query returns a list of table names like this:
+---------+
| Table |
+---------+
| Alpha |
| Beta |
| Gamma |
| ... |
+---------+
But is it possible to write a query that can also count the number of rows in each table that match a given OwnerID? ie:
+---------+------------+
| Table | RowCount |
+---------+------------+
| Alpha | 2042 |
| Beta | 49 |
| Gamma | 740 |
| ... | ... |
+---------+------------+
Note: The list of table names needs to be returned dynamically, it is not suitable to hard-code table names into this query.
Edit: the answer...
(I can't edit your answers yet but I can edit my own question so I'm putting it here...)
Damien_The_Unbeliever had essentially the correct answer, but SQL Server doesn't allow string concatenation in an exec statement so I had to set the query prior to the exec statement. The final query is as follows:
DECLARE #OwnerID int;
SET #OwnerID = 1;
DECLARE #ForEachSQL varchar(100);
SET #ForEachSQL = 'INSERT INTO #t(TableName,RowsOwned) SELECT ''?'', COUNT(*) FROM ? WHERE OwnerID = ' + CONVERT(varchar(11), #OwnerID);
CREATE TABLE #t(TableName sysname, RowsOwned int);
EXEC sp_MSforeachtable #ForEachSQL,
#whereAnd = 'AND o.id IN (SELECT id FROM syscolumns where name=''OwnerID'')';
SELECT * FROM #t ORDER BY TableName;
DROP TABLE #t;

You can use sp_MSForeachtable, and the #whereand parameter, to specify a filter so you're only working against tables with an OwnerID column. Create a temp table, and populate that for each matching table. Something like:
create table #t(tablename sysname,Cnt int)
sp_MSforeachtable 'insert into #t(tablename,Cnt) select ''?'',COUNT(*) from ?',#whereAnd='and o.id in (select id from syscolumns where name=''OwnerID'')'
select * from #t
Two major caveats to mention - first is that sp_MSforeachtable is "undocumented", so you use it at your own risk - it could be suddenly removed from SQL Server by any kind of servicing, or in the next release.
The second is that, having a dynamic schema is usually a sign that something else has gone wrong in modelling - possibly attribute splitting (where sales for January and February are given different tables, even though they're logically the same thing and should appear in the same table, with possibly an additional column to distinguish them)
And, of course, you wanted to filter based on a particular clientID, so the query would be more like:
'insert into #t(tablename,Cnt) select ''?'',COUNT(*) from ? where OwnerID=' + #OwnerID
(Assuming #OwnerID is the owner sought, and is an int)

This would get the info from sysindexes. It can be slightly out of date but will give you a rough count
SELECT
[TableName] = so.name,
[RowCount] = MAX(si.rows)
FROM
sysobjects so,
sysindexes si
WHERE
so.xtype = 'U'
AND
si.id = OBJECT_ID(so.name)
GROUP BY
so.name
ORDER BY
2 DESC
If you needed it to be 100% right then you could use the undocumented feature sp_MSForEachTable
DECLARE #SQL VARCHAR(255)
SET #SQL = 'DBCC UPDATEUSAGE (' + DB_NAME() + ')'
EXEC(#SQL)
CREATE TABLE #foo
(
tablename VARCHAR(255),
rc INT
)
INSERT #foo
EXEC sp_msForEachTable
'SELECT PARSENAME(''?'', 1),
COUNT(*) FROM ?'
SELECT tablename, rc
FROM #foo
ORDER BY rc DESC
DROP TABLE #foo

You can use this:
DECLARE #nSQL NVARCHAR(MAX)
SELECT #nSQL = COALESCE(#nSQL + 'UNION ALL ' + CHAR(10), '')
+ 'SELECT ''' + TABLE_NAME + ''' AS TableName, COUNT(*) FROM ' + QUOTENAME(TABLE_NAME) + CHAR(10)
FROM INFORMATION_SCHEMA.COLUMNS
WHERE COLUMN_NAME = 'strKey'
-- This will PRINT out the dynamically generated SQL statement. Just replace this with EXECUTE(#nSQL) when you are happy to run it.
PRINT #nSQL
Update: To search for a specific OwnerId:
DECLARE #nSQL NVARCHAR(MAX)
DECLARE #OwnerId INTEGER
SET #OwnerId = 1
SELECT #nSQL = COALESCE(#nSQL + 'UNION ALL ' + CHAR(10), '')
+ 'SELECT ''' + TABLE_NAME + ''' AS TableName, COUNT(*) FROM ' + QUOTENAME(TABLE_NAME) + ' WHERE OwnerId = #OwnerId' + CHAR(10)
FROM INFORMATION_SCHEMA.COLUMNS
WHERE COLUMN_NAME = 'strKey'
EXECUTE sp_executesql #nSQL, '#OwnerId INTEGER', #OwnerId

SELECT
O.ID,
O.NAME,
I.ROWCNT
FROM SYSOBJECTS O
INNER JOIN SYSINDEXES I
ON O.ID = I.ID
WHERE O.UID = 5
AND O.XTYPE = 'U'
AND I.STATUS = 0
Try using this query it will give you id of the table, table name and no of rows for that table.
UID = 5 means I want to check in particular schema which has id = 5.You can check schema id using SELECT SCHEMA_ID('<schema name>');
XTYPE = 'U' means User defined tables only.

Related

ms sql server how to check table has “id” column and count rows if "id" exist

There are too many tables in my SQL Server db. Most of them have an 'id' column, but some do not. I want to know which table(s) doesn't have the 'id' column and to count the rows where id=null if an 'id' column exists. The query results may look like this:
TABLE_NAME | HAS_ID | ID_NULL_COUNT | ID_NOT_NULL_COUNT
table1 | false | 0 | 0
table2 | true | 10 | 100
How do I write this query?
Building query:
WITH cte AS (
SELECT t.*, has_id = CASE WHEN COLUMN_NAME = 'ID' THEN 'true' ELSE 'false' END
FROM INFORMATION_SCHEMA.TABLES t
OUTER APPLY (SELECT COLUMN_NAME FROM INFORMATION_SCHEMA.COLUMNS c
WHERE t.TABLE_NAME = c.TABLE_NAME
AND t.[TABLE_SCHEMA] = c.[TABLE_SCHEMA]
AND c.COLUMN_NAME = 'id') s
WHERE t.TABLE_SCHEMA IN (...)
)
SELECT
query_to_run = REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
'SELECT tab_name = ''<tab_name>'',
has_id = ''<has_id>'',
id_null_count = <id_null_count>,
id_not_null_count = <id_not_null_count>
FROM <schema_name>.<tab_name>'
,'<tab_name>', TABLE_NAME)
,'<schema_name>', TABLE_SCHEMA)
,'<has_id>', has_id)
,'<id_null_count>', CASE WHEN has_id = 'false' THEN '0' ELSE 'SUM(CASE WHEN id IS NULL THEN 1 END)' END)
,'<id_not_null_count>', CASE WHEN has_id = 'false' THEN '0' ELSE 'COUNT(id)' END)
FROM cte;
Copy the output and execute in separate window. UNION ALL could be added to get single resultset.
db<>fiddle demo
This might be useful for you... lists out the row count for all tables that have an "id" column. It filters out tables that start with "sys" because those are mostly internal tables. If you have a table that starts with "sys", you'll probably want to delete that part of the WHERE clause.
SELECT DISTINCT OBJECT_NAME(r.[object_id]) AS [TableName], [row_count] AS [RowCount]
FROM sys.dm_db_partition_stats r
WHERE index_id = 1
AND EXISTS (SELECT 1 FROM sys.columns c WHERE c.[object_id] = r.[object_id] AND c.[name] = N'id')
AND OBJECT_NAME(r.[object_id]) NOT LIKE 'sys%'
ORDER BY [TableName]
Note you can change the "c.[name] = N'id'" to be any column name, or even change the "=" to "<>" to find only tables without an id column
pmbAustin answers how to list all tables without "ID" column.
To know how many rows in each table, SQL Server has a built-in report for you.
Right click the database in SSMS, click "Reports", "Standard Reports" then "Disk Usage by Table"
You now know how many rows in each table, and from pmbAustin's answer you know how which tables do and do not have "ID" columns. with a simple Vlookup in Excel you can combine these two datasets to arrive at any answer you wish.
This will give you the info about which tables have or not have column named "ID":
SELECT Table_Name
, case when column_name not like '%ID%' then 'false'
else 'true'
end as HAS_ID
FROM INFORMATION_SCHEMA.COLUMNS;
Here is a small demo
And here is one way that you can use to select all the tables that have columns named ID and if this columns are null or not:
CREATE TABLE #AllIDSNullable (TABLE_NAME NVARCHAR(256) NOT NULL
, HAS_ID VARCHAR(10)
, ID_NULL_COUNT INT DEFAULT 0
, ID_NOT_NULL_COUNT INT DEFAULT 0);
DECLARE CT CURSOR FOR
SELECT Table_Name
FROM INFORMATION_SCHEMA.COLUMNS
WHERE column_name = 'ID';
DECLARE #name NVARCHAR(MAX), #SQL NVARCHAR(MAX);
OPEN CT; FETCH NEXT FROM CT INTO #name;
WHILE ##FETCH_STATUS=0 BEGIN
SET #SQL = 'INSERT #AllIDSNullable (TABLE_NAME , HAS_ID) SELECT Table_Name, case when column_name not like ''%ID%'' then ''false'' else ''true'' end FROM INFORMATION_SCHEMA.COLUMNS;';
EXEC (#SQL);
SET #SQL = 'UPDATE #AllIDSNullable SET ID_NULL_COUNT = (SELECT COUNT(*) FROM ['+#name+'] WHERE ID IS NULL), ID_NOT_NULL_COUNT = (SELECT COUNT(*) FROM ['+#name+'] WHERE ID IS NOT NULL) WHERE TABLE_NAME='''+#name+''';';
EXEC (#SQL);
FETCH NEXT FROM CT INTO #name;
END;
CLOSE CT;
SELECT *
FROM #AllIDSNullable;
Here is a demo
Result:

Using Subqueries to Define Column Alias

I have two tables which I have simplified below for clarity. One stores data values while the other defines the units and type of data. Some tests have one result, others may have more (My actual table has results 1-10):
Table 'Tests':
ID Result1 Result2 TestType(FK to TestTypes Type)
---------- ------------ ----------- -----------
1001 50 29 1
1002 90.9 NULL 2
1003 12.4 NULL 2
1004 20.2 30 1
Table 'TestTypes':
Type TestName Result1Name Result1Unit Result2Name Result2Unit ..........
------- --------- ------------ ----------- ------------ -----------
1 Temp Calib. Temperature F Variance %
2 Clarity Turbidity CU NULL NULL
I would like to use the ResultXName as the column alias when I join the two tables. In other words, if a user wants to see all Type 1 'Temp Calib' tests, the data would be formatted as follows:
Temperature Variance
------------ -----------
50 F 10.1%
20.2 F 4.4%
Or if they look at Type 2, which only uses 1 result and should ignore the NULL:
Turbidity
----------
90.9 CU
12.4 CU
I have had some success in combining the two columns of the tables:
SELECT CONCAT(Result1, ' ', ISNULL(Result1Unit, ''))
FROM Tests
INNER JOIN TestTypes ON Tests.TestType = TestTypes.Type
But I cannot figure out how to use the TestName as the new column alias. This is what I've been trying using a subquery, but it seems subqueries are not allowed in the AS clause:
SELECT CONCAT(Result1, ' ', ISNULL(Result1Unit, '')) AS (SELECT TOP(1) Result1Name FROM TestTypes WHERE Type = 1)
FROM Tests
INNER JOIN TestTypes ON Tests.TestType = TestTypes.Type
Is there a different method I can use? Or do I need to restructure my data to achieve this? I am using MSSQL.
Yes, this can be fully automated by constructing a dynamic SQL string carefully. The key points in this solution and references is listed as follows.
Count the Result variables (section 1.)
Get the new column name of ResultXName by using sp_executesql with the output definition (section 2-1)
Append the clause for the new column (section 2-2)
N.B.1. Although a dynamic table schema is usually considered a bad design, sometimes people are simply ordered to do that. Therefore I do not question the adequacy of this requirement.
N.B.2. Mind the security problem of arbitrary string execution. Additional string filters may be required depending on your use case.
Test Dataset
use [testdb];
GO
if OBJECT_ID('testdb..Tests') is not null
drop table testdb..Tests;
create table [Tests] (
[ID] int,
Result1 float,
Result2 float,
TestType int
)
insert into [Tests]([ID], Result1, Result2, TestType)
values (1001,50,29,1),
(1002,90.9,NULL,2),
(1003,12.4,NULL,2),
(1004,20.2,30,1);
if OBJECT_ID('testdb..TestTypes') is not null
drop table testdb..TestTypes;
create table [TestTypes] (
[Type] int,
TestName varchar(50),
Result1Name varchar(50),
Result1Unit varchar(50),
Result2Name varchar(50),
Result2Unit varchar(50)
)
insert into [TestTypes]([Type], TestName, Result1Name, Result1Unit, Result2Name, Result2Unit)
values (1,'Temp Calib.','Temperature','F','Variance','%'),
(2,'Clarity','Turbidity','CU',NULL,NULL);
--select * from [Tests];
--select * from [TestTypes];
Solution
/* Input Parameter */
declare #type_no int = 1;
/* 1. determine the number of Results */
declare #n int;
-- If there are hundreds of results please use the method as of (2-1)
select #n = LEN(COALESCE(LEFT(Result1Name,1),''))
+ LEN(COALESCE(LEFT(Result2Name,1),''))
FROM [TestTypes]
where [Type] = #type_no;
/* 2. build dynamic query string */
-- cast type number as string
declare #s_type varchar(10) = cast(#type_no as varchar(10));
-- sql query string
declare #sql nvarchar(max) = '';
declare #sql_colname nvarchar(max) = '';
-- loop variables
declare #i int = 1; -- loop index
declare #s varchar(10); -- stringified #i
declare #colname varchar(max); -- new column name
set #sql += '
select
L.[ID]';
-- add columns one by one
while #i <= #n begin
set #s = cast(#i as varchar(10));
-- (2-1) find the new column name
SET #sql_colname = N'select #colname = Result' + #s + 'Name
from [TestTypes]
where [Type] = ' + #s_type;
EXEC SP_EXECUTESQL
#Query = #sql_colname,
#Params = N'#colname varchar(max) OUTPUT',
#colname = #colname OUTPUT;
-- (2-2) sql clause of the new column
set #sql += ',
cast(L.Result' + #s + ' as varchar(10)) + '' '' + R.Result' + #s + 'Unit as [' + #colname + ']'
-- next Result
set #i += 1
end
set #sql += '
into [ans]
from [Tests] as L
inner join [TestTypes] as R
on L.TestType = R.Type
where R.[Type] = ' + #s_type;
/* execute */
print #sql; -- check the query string
if OBJECT_ID('testdb..ans') is not null
drop table testdb..ans;
exec sp_sqlexec #sql;
/* show */
select * from [ans];
Result (type = 1)
| ID | Temperature | Variance |
|------|-------------|----------|
| 1001 | 50 F | 29 % |
| 1004 | 20.2 F | 30 % |
/* the query string */
select
L.[ID],
cast(L.Result1 as varchar(10)) + ' ' + R.Result1Unit as [Temperature],
cast(L.Result2 as varchar(10)) + ' ' + R.Result2Unit as [Variance]
into [ans]
from [Tests] as L
inner join [TestTypes] as R
on L.TestType = R.Type
where R.[Type] = 1
Tested on SQL Server 2017 (linux docker image, latest version) on debian 10

Single column from Multiple tables SQL-SERVER 2014 Exprs

I have a DB with 50 tables having the same structure (same column names, types) clustered Indexed on the Created Date column . Each of these tables have around ~ 100,000 rows and I need to pull all of them for some columns.
select * from customerNY
created date | Name | Age | Gender
__________________________________
25-Jan-2016 | Chris| 25 | M
27-Jan-2016 | John | 24 | M
30-Jan-2016 | June | 34 | F
select * from customerFL
created date | Name | Age | Gender
__________________________________
25-Jan-2016 | Matt | 44 | M
27-Jan-2016 | Rose | 24 | F
30-Jan-2016 | Bane | 34 | M
The above is an example of the tables in the DB. I need an SQL that runs quickly pulling all the data. Currently, I am using UNION ALL for this but it takes a lot of time for completing the report. Is there another way for this where I can pull in data without using UNION ALL such as
select Name, Age, Gender from [:customerNY:customerFL:]
Out of context: Can I pull in the table name in the result?
Thanks for any help. I've been putting my mind to this but I can't find a way to do it quicker.
This dynamic SQL approach should meet your criteria, it selects table names from the schema and creates a SELECT statement at runtime for it to execute, and to meet the criteria of the UNION ALL each SELECT statement is given a UNION ALL then I use STUFF to remove the first one.
DECLARE #SQL AS VarChar(MAX)
SET #SQL = ''
SELECT #SQL = #SQL + 'UNION ALL SELECT Name, Age, Gender FROM ' + TABLE_SCHEMA + '.[' + TABLE_NAME + ']' + CHAR(13)
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_NAME LIKE 'Customer%'
SELECT #SQL = STUFF(#SQL,1,10,'')
EXEC (#SQL)
However I do not recommend using this and you should do what people have suggested in the comments to restructure your data.
Memory Optimising the test tables below gave a 7x speed increase compared to the same data in regular tables. Samples are 50 tables of 100000 rows. Please only run this on a test server as it creates filegroups/tables etc.:
USE [master]
GO
ALTER DATABASE [myDB] ADD FILEGROUP [MemOptData] CONTAINS MEMORY_OPTIMIZED_DATA
GO
ALTER DATABASE [myDB] ADD FILE ( NAME = N'Mem', FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL13.MSSQLSERVER\MSSQL\DATA' ) TO FILEGROUP [MemOptData] --Change Path for your version
Go
use [myDB]
go
set nocount on
declare #loop1 int = 1
declare #loop2 int = 1
declare #NoTables int = 50
declare #noRows int = 100000
declare #sql nvarchar(max)
while #loop1 <= #NoTables
begin
set #sql = 'create table [MemCustomer' + cast(#loop1 as nvarchar(6)) + '] ([ID] [int] IDENTITY(1,1) NOT NULL,[Created Date] date, [Name] varchar(20), [Age] int, Gender char(1), CONSTRAINT [PK_Customer' + cast(#loop1 as nvarchar(6)) + '] PRIMARY KEY NONCLUSTERED
(
[ID] ASC
)) WITH (MEMORY_OPTIMIZED = ON, DURABILITY = SCHEMA_AND_DATA)'
exec(#sql)
while #loop2 <= #noRows
begin
set #sql = 'insert into [MemCustomer' + cast(#loop1 as nvarchar(6)) + '] ([Created Date], [Name], [Age], [Gender]) values (DATEADD(DAY, ROUND(((20) * RAND()), 0), DATEADD(day, 10, ''2018-06-01'')), (select top 1 [name] from (values(''bill''),(''steve''),(''jack''),(''roger''),(''paul''),(''ozzy''),(''tom''),(''brian''),(''norm'')) n([name]) order by newid()), FLOOR(RAND()*(85-18+1))+18, iif(FLOOR(RAND()*(2))+1 = 1, ''M'', ''F''))'
--print #sql
exec(#sql)
set #loop2 = #loop2 + 1
end
set #loop2 = 1
set #loop1 = #loop1 + 1
end
;with cte as (
Select * from MemCustomer1
UNION
Select * from MemCustomer2
UNION
...
UNION
Select * from MemCustomer50
)
select * from cte where [name] = 'tom' and age = 27 and gender = 'F'

How to UPDATE all columns of a record without having to list every column

I'm trying to figure out a way to update a record without having to list every column name that needs to be updated.
For instance, it would be nice if I could use something similar to the following:
// the parts inside braces are what I am trying to figure out
UPDATE Employee
SET {all columns, without listing each of them}
WITH {this record with id of '111' from other table}
WHERE employee_id = '100'
If this can be done, what would be the most straightforward/efficient way of writing such a query?
It's not possible.
What you're trying to do is not part of SQL specification and is not supported by any database vendor. See the specifications of SQL UPDATE statements for MySQL, Postgresql, MSSQL, Oracle, Firebird, Teradata. Every one of those supports only below syntax:
UPDATE table_reference
SET column1 = {expression} [, column2 = {expression}] ...
[WHERE ...]
This is not posible, but..
you can doit:
begin tran
delete from table where CONDITION
insert into table select * from EqualDesingTabletoTable where CONDITION
commit tran
be carefoul with identity fields.
Here's a hardcore way to do it with SQL SERVER. Carefully consider security and integrity before you try it, though.
This uses schema to get the names of all the columns and then puts together a big update statement to update all columns except ID column, which it uses to join the tables.
This only works for a single column key, not composites.
usage: EXEC UPDATE_ALL 'source_table','destination_table','id_column'
CREATE PROCEDURE UPDATE_ALL
#SOURCE VARCHAR(100),
#DEST VARCHAR(100),
#ID VARCHAR(100)
AS
DECLARE #SQL VARCHAR(MAX) =
'UPDATE D SET ' +
-- Google 'for xml path stuff' This gets the rows from query results and
-- turns into comma separated list.
STUFF((SELECT ', D.'+ COLUMN_NAME + ' = S.' + COLUMN_NAME
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = #DEST
AND COLUMN_NAME <> #ID
FOR XML PATH('')),1,1,'')
+ ' FROM ' + #SOURCE + ' S JOIN ' + #DEST + ' D ON S.' + #ID + ' = D.' + #ID
--SELECT #SQL
EXEC (#SQL)
In Oracle PL/SQL, you can use the following syntax:
DECLARE
r my_table%ROWTYPE;
BEGIN
r.a := 1;
r.b := 2;
...
UPDATE my_table
SET ROW = r
WHERE id = r.id;
END;
Of course that just moves the burden from the UPDATE statement to the record construction, but you might already have fetched the record from somewhere.
How about using Merge?
https://technet.microsoft.com/en-us/library/bb522522(v=sql.105).aspx
It gives you the ability to run Insert, Update, and Delete. One other piece of advice is if you're going to be updating a large data set with indexes, and the source subset is smaller than your target but both tables are very large, move the changes to a temporary table first. I tried to merge two tables that were nearly two million rows each and 20 records took 22 minutes. Once I moved the deltas over to a temp table, it took seconds.
If you are using Oracle, you can use rowtype
declare
var_x TABLE_A%ROWTYPE;
Begin
select * into var_x
from TABLE_B where rownum = 1;
update TABLE_A set row = var_x
where ID = var_x.ID;
end;
/
given that TABLE_A and TABLE_B are of same schema
It is possible. Like npe said it's not a standard practice. But if you really have to:
1. First a scalar function
CREATE FUNCTION [dte].[getCleanUpdateQuery] (#pTableName varchar(40), #pQueryFirstPart VARCHAR(200) = '', #pQueryLastPart VARCHAR(200) = '', #pIncludeCurVal BIT = 1)
RETURNS VARCHAR(8000) AS
BEGIN
DECLARE #pQuery VARCHAR(8000);
WITH cte_Temp
AS
(
SELECT
C.name
FROM SYS.COLUMNS AS C
INNER JOIN SYS.TABLES AS T ON T.object_id = C.object_id
WHERE T.name = #pTableName
)
SELECT #pQuery = (
CASE #pIncludeCurVal
WHEN 0 THEN
(
STUFF(
(SELECT ', ' + name + ' = ' + #pQueryFirstPart + #pQueryLastPart FROM cte_Temp FOR XML PATH('')), 1, 2, ''
)
)
ELSE
(
STUFF(
(SELECT ', ' + name + ' = ' + #pQueryFirstPart + name + #pQueryLastPart FROM cte_Temp FOR XML PATH('')), 1, 2, ''
)
) END)
RETURN 'UPDATE ' + #pTableName + ' SET ' + #pQuery
END
2. Use it like this
DECLARE #pQuery VARCHAR(8000) = dte.getCleanUpdateQuery(<your table name>, <query part before current value>, <query part after current value>, <1 if current value is used. 0 if updating everything to a static value>);
EXEC (#pQuery)
Example 1: make all employees columns 'Unknown' (you need to make sure column type matches the intended value:
DECLARE #pQuery VARCHAR(8000) = dte.getCleanUpdateQuery('employee', '', 'Unknown', 0);
EXEC (#pQuery)
Example 2: Remove an undesired text qualifier (e.g. #)
DECLARE #pQuery VARCHAR(8000) = dte.getCleanUpdateQuery('employee', 'REPLACE(', ', ''#'', '''')', 1);
EXEC (#pQuery)
This query can be improved. This is just the one I saved and sometime I use. You get the idea.
Similar to an upsert, you could check if the item exists on the table, if so, delete it and insert it with the new values (technically updating it) but you would lose your rowid if that's something sensitive to keep in your case.
Behold, the updelsert
IF NOT EXISTS (SELECT * FROM Employee WHERE ID = #SomeID)
INSERT INTO Employee VALUES(#SomeID, #Your, #Vals, #Here)
ELSE
DELETE FROM Employee WHERE ID = #SomeID
INSERT INTO Employee VALUES(#SomeID, #Your, #Vals, #Here)
you could do it by deleting the column in the table and adding the column back in and adding a default value of whatever you needed it to be. then saving this will require to rebuild the table

SELECT INTO behavior and the IDENTITY property

I've been working on a project and came across some interesting behavior when using SELECT INTO. If I have a table with a column defined as int identity(1,1) not null and use SELECT INTO to copy it, the new table will retain the IDENTITY property unless there is a join involved. If there is a join, then the same column on the new table is defined simply as int not null.
Here is a script that you can run to reproduce the behavior:
CREATE TABLE People (Id INT IDENTITY(1,1) not null, Name VARCHAR(10))
CREATE TABLE ReverseNames (Name varchar(10), ReverseName varchar(10))
INSERT INTO People (Name)
VALUES ('John'), ('Jamie'), ('Joe'), ('Jenna')
INSERT INTO ReverseNames (Name, ReverseName)
VALUES ('John','nhoJ'), ('Jamie','eimaJ'), ('Joe','eoJ'), ('Jenna','anneJ')
--------
SELECT Id, Name
INTO People_ExactCopy
FROM People
SELECT Id, ReverseName as Name
INTO People_WithJoin
FROM People
JOIN ReverseNames
ON People.Name = ReverseNames.Name
SELECT Id, (SELECT ReverseName FROM ReverseNames WHERE Name = People.Name) as Name
INTO People_WithSubSelect
FROM People
--------
SELECT OBJECT_NAME(c.object_id) as [Table],
c.is_identity as [Id Column Retained Identity]
FROM sys.columns c
where
OBJECT_NAME(c.object_id) IN ('People_ExactCopy','People_WithJoin','People_WithSubSelect')
AND c.name = 'Id'
--------
DROP TABLE People
DROP TABLE People_ExactCopy
DROP TABLE People_WithJoin
DROP TABLE People_WithSubSelect
DROP TABLE ReverseNames
I noticed that the execution plans for both the WithJoin and WithSubSelect queries contained one join operator. I'm not sure if one will be significantly better on performance if we were dealing with a larger set of rows.
Can anyone shed any light on this and tell me if there is a way to utilize SELECT INTO with joins and still preserve the IDENTITY property?
From Microsoft:
When an existing identity column is
selected into a new table, the new
column inherits the IDENTITY property,
unless one of the following conditions
is true:
The SELECT statement contains a join, GROUP BY clause, or aggregate function.
Multiple SELECT statements are joined by using UNION.
The identity column is listed more than one time in the select list.
The identity column is part of an expression.
The identity column is from a remote data source.
If any one of these conditions is
true, the column is created NOT NULL
instead of inheriting the IDENTITY
property. If an identity column is
required in the new table but such a
column is not available, or you want a
seed or increment value that is
different than the source identity
column, define the column in the
select list using the IDENTITY
function.
You could use the IDENTITY function as they suggest and omit the IDENTITY column, but then you would lose the values, as the IDENTITY function would generate new values and I don't think that those are easily determinable, even with ORDER BY.
I don't believe there is much you can do, except build your CREATE TABLE statements manually, SET IDENTITY_INSERT ON, insert the existing values, then SET IDENTITY_INSERT OFF. Yes you lose the benefits of SELECT INTO, but unless your tables are huge and you are doing this a lot, [shrug]. This is not fun of course, and it's not as pretty or simple as SELECT INTO, but you can do it somewhat programmatically, assuming two tables, one having a simple identity (1,1), and a simple INNER JOIN:
SET NOCOUNT ON;
DECLARE
#NewTable SYSNAME = N'dbo.People_ExactCopy',
#JoinCondition NVARCHAR(255) = N' ON p.Name = r.Name';
DECLARE
#cols TABLE(t SYSNAME, c SYSNAME, p CHAR(1));
INSERT #cols SELECT N'dbo.People', N'Id', 'p'
UNION ALL SELECT N'dbo.ReverseNames', N'Name', 'r';
DECLARE #sql NVARCHAR(MAX) = N'CREATE TABLE ' + #NewTable + '
(
';
SELECT #sql += c.name + ' ' + t.name
+ CASE WHEN t.name LIKE '%char' THEN
'(' + CASE WHEN c.max_length = -1
THEN 'MAX' ELSE RTRIM(c.max_length/
(CASE WHEN t.name LIKE 'n%' THEN 2 ELSE 1 END)) END
+ ')' ELSE '' END
+ CASE c.is_identity
WHEN 1 THEN ' IDENTITY(1,1)'
ELSE ' ' END + ',
'
FROM sys.columns AS c
INNER JOIN #cols AS cols
ON c.object_id = OBJECT_ID(cols.t)
INNER JOIN sys.types AS t
ON c.system_type_id = t.system_type_id
AND c.name = cols.c;
SET #sql = LEFT(#sql, LEN(#sql)-1) + '
);
SET IDENTITY_INSERT ' + #NewTable + ' ON;
INSERT ' + #NewTable + '(';
SELECT #sql += c + ',' FROM #cols;
SET #sql = LEFT(#sql, LEN(#sql)-1) + ')
SELECT ';
SELECT #sql += p + '.' + c + ',' FROM #cols;
SET #sql = LEFT(#sql, LEN(#sql)-1) + '
FROM ';
SELECT #sql += t + ' AS ' + p + '
INNER JOIN ' FROM (SELECT DISTINCT
t,p FROM #cols) AS x;
SET #sql = LEFT(#sql, LEN(#sql)-10)
+ #JoinCondition + ';
SET IDENTITY_INSERT ' + #NewTable + ' OFF;';
PRINT #sql;
With the tables given above, this produces the following, which you could pass to EXEC sp_executeSQL instead of PRINT:
CREATE TABLE dbo.People_ExactCopy
(
Id int IDENTITY(1,1),
Name varchar(10)
);
SET IDENTITY_INSERT dbo.People_ExactCopy ON;
INSERT dbo.People_ExactCopy(Id,Name)
SELECT p.Id,r.Name
FROM dbo.People AS p
INNER JOIN dbo.ReverseNames AS r
ON p.Name = r.Name;
SET IDENTITY_INSERT dbo.People_ExactCopy OFF;
I did not deal with other complexities such as DECIMAL columns or other columns that have parameters such as max_length, nor did I deal with nullability, but these things wouldn't be hard to add it if you need greater flexibility.
In the next version of SQL Server (code-named "Denali") you should be able to construct a CREATE TABLE statement much easier using the new metadata discovery functions - which do much of the grunt work for you in terms of specifying precision/scale/length, dealing with MAX, etc. You still have to manually create indexes and constraints; but you don't get those with SELECT INTO either.
What we really need is DDL that allows you to say something like "CREATE TABLE a IDENTICAL TO b;" or "CREATE TABLE a BASED ON b;"... it's been asked for here, but has been rejected (this is about copying a table to another schema, but the same concept could apply to a new table in the same schema with a different table name). http://connect.microsoft.com/SQLServer/feedback/details/632689
I realize this is a really late response but whoever is still looking for this solution, like I was until I found this solution:
You can't use the JOIN operator for the IDENTITY column property to be inherited.
What you can do is use a WHERE clause like this:
SELECT a.*
INTO NewTable
FROM
MyTable a
WHERE
EXISTS (SELECT 1 FROM SecondTable b WHERE b.ID = a.ID)
This works.