Can We Create Index in Table Valued Function - sql

Can we create an index on a column in a table valued functions table in SQL Server 2008?
My function is getting slow results. When I look into the execution plan, it was under the table scan, hence I need to create index on function table column so that put where clause on that.
Any help would be highly appreciated.
Thanks in advance

If the table valued function is of the inline variety you would create the index on the underlying table columns.
If it is a multi statement TVF in SQL Server 2008 (as tagged) you can only create the indexes associated with primary key or unique constraints.
In SQL Server 2014+ it is possible to declare inline indexes not associated with any constraint.
Example
CREATE FUNCTION F()
RETURNS #X TABLE
(
A INT PRIMARY KEY /*<-- Implicit clustered index*/
)
AS
BEGIN
INSERT INTO #X
VALUES(1),(2)
RETURN;
END
GO
SELECT *
FROM F()
WHERE A = 12
The above materializes the entire resultset up front into a table variable first, and creates an implicit index on it.
Generally inline TVFs are preferred to multi statement ones.

CREATE OR ALTER FUNCTION dbo.tvfExample()
RETURNS #Example TABLE
(
Field1_ID INT NOT NULL,
Field2_ID INT NOT NULL,
Field3_ID INT NOT NULL,
PRIMARY KEY CLUSTERED (Field1_ID ASC, Field2_ID ASC, Field3_ID ASC)
)
AS
BEGIN
...
RETURN
END
GO

Related

Can't use UDF as constraint in MariaDB

I'm running MariaDB 10.3.17 and I'm trying to add a constraint to an existing table. The constraint uses a UDF - which should be allowed.
Here's my table and UDF.
CREATE OR REPLACE TABLE real_estate.sample_two_expected_output (
u_id int (9) NOT NULL,
first_date date NOT NULL,
last_date date NOT NULL,
days int AS (DATEDIFF(last_date,first_date)+1),
address varchar(50),
price varchar(50),
--Constraints
CONSTRAINT dates CHECK (last_date >= first_date),
PRIMARY KEY (u_id,first_date));
DELIMITER //
USE real_estate;
CREATE OR REPLACE FUNCTION overlap(
u_id INT,
first_date DATE,
last_date DATE
) RETURNS INT DETERMINISTIC
BEGIN
DECLARE valid INT;
SET valid = 1;
IF EXISTS(SELECT * FROM real_estate.sample_two_expected_output t WHERE t.u_id = u_id AND first_date <= t.last_date AND t.first_date <= last_date) THEN SET valid = 0;
ELSE SET valid = 1;
END IF;
RETURN valid;
END; \\
DELIMITER;
I try to add this function as a constraint in the table.
ALTER TABLE real_estate.sample_two_expected_output ADD CONSTRAINT overlap CHECK(overlap(u_id,first_date,last_date)=1);
However I get the below error message and I don't know why.
EXECUTE FAIL:
ALTER TABLE real_estate.sample_two_expected_output ADD CONSTRAINT overlap CHECK(overlap(u_id,first_date,last_date)=1);
Message :
Function or expression '`overlap`()' cannot be used in the CHECK clause of `overlap`
In general you can use any deterministic user defined function (UDF) but not a stored function (SF) in constraints like DEFAULT, CHECK, etc.
A big difference between UDFs and SFs is the fact that a UDF is usually written in C/C++ while a SF is written in SQL. That means it is not possible to execute SQL code in a UDF within the same connection, which would lead to significant problems, as your SF shows:
Depending on the storage engine ALTER TABLE locks the entire table, parts of it or creates a temporary copy. I cannot imagine a way to execute the SQL statement SELECT * FROM real_estate.sample_two_expected_output t WHERE t.u_id = u_id .. in your SF while the table is locked or reorganized.

create index clause on table variable

I need to create an Index on two columns (within a table variable) which do not form unique key.
Table structure is shown below -
DECLARE #Sample TABLE (
[AssetSk] [int] NOT NULL,
[DateSk] [int] NOT NULL,
[Count] [numeric](38, 2) NULL
)
I am trying to add Index as shown below -
INDEX AD1 CLUSTERED([AssetSk],[DateSk])
However it gives me the following error while running it on SQL Server 2012
" Incorrect syntax near 'INDEX'. If this is intended as a part of a table hint, A WITH keyword and parenthesis are now required. See SQL Server Books Online for proper syntax."
However, this runs perfectly on SQL Server 2014 . Is there any way that I could run it on SQL Server 2012 .
You can't build index other than unique key at table variable using SQL Server version prior to 2014.
However, you can do the trick: add one more colummn with autoincremented value and create unique index including columns you need and this new one.
DECLARE #Sample TABLE (
[ID] bigint identity(1, 1),
[AssetSk] [int] NOT NULL,
[DateSk] [int] NOT NULL,
[Count] [numeric](38, 2) NULL,
UNIQUE NONCLUSTERED ([AssetSk],[DateSk], ID)
)
Update: In fact, creation of such an index on table variable can be useless. Normally SQL Server estimates that a table variable has a single row, thus it will not use this index with relatively high probability.
As far as I know in SQL Server 2012 and below you can not add indexes to table variables. To add an index you must declare the table like this:
CREATE TABLE #Sample (
[AssetSk] [int] NOT NULL,
[DateSk] [int] NOT NULL,
[Count] [numeric](38, 2) NULL
)
And after you can create the index you need like this
CREATE CLUSTERED INDEX IX_MyIndex
ON #Sample ([AssetSk],[DateSk])
Of course, after you're done with the table in four function you can call
DROP TABLE #Sample

Optimize a stored procedure that accepts table parameters as filters against a view

I'm looking for an efficient way to filter a view with optional table parameters.
Examples are best so here is a sample situation:
-- database would contain a view that I want to be able to filter
CREATE VIEW [dbo].[MyView]
AS
BEGIN
-- maybe 20-40 columns
SELECT Column1, Column2, Column3, ...
END
I have user-defined table types like so:
-- single id table for joining purposes (passed from code)
CREATE TYPE [dbo].[SingleIdTable] AS TABLE (
[Id] INT NOT NULL,
PRIMARY KEY CLUSTERED ([Id] ASC) WITH (IGNORE_DUP_KEY = OFF));
-- double id table for joining purposes (passed from code)
CREATE TYPE [dbo].[DoubleIdTable] AS TABLE (
[Id1] INT NOT NULL,
[Id2] INT NOT NULL,
PRIMARY KEY CLUSTERED ([Id1] ASC, [Id2] ASC) WITH (IGNORE_DUP_KEY = OFF));
And I want to create a stored procedure that basically looks like this:
CREATE PROCEDURE [dbo].[FilterMyView]
#Parameter1 dbo.SingleIdTable READONLY,
#Parameter2 dbo.DoubleIdTable READONLY,
#Parameter3 dbo.SingleIdTable READONLY
AS
BEGIN
SELECT *
FROM MyView
INNER-JOIN-IF-NOT-EMPTY #Parameter1 p1 ON p1.Id = MyView.Column1 AND
INNER-JOIN-IF-NOT-EMPTY #Parameter2 p2 ON p2.Id1 = MyView.Column5 AND
p2.Id2 = MyView.Column6 AND
INNER-JOIN-IF-NOT-EMPTY #Parameter3 p3 ON p3.Id = MyView.Column8
END
Now I believe I can do this with WHERE EXISTS but I want to make sure that I am doing this in the most efficient way for the SQL engine. I've always personally felt that the INNER JOIN semantic creates the most optimized execution plans, but I don't actually know.
I also know that I can do this using dynamic SQL, but I always leave this as a last option.

Derived table with an index

Please see the TSQL below:
DECLARE #TestTable table (reference int identity,
TestField varchar(10),
primary key (reference))
INSERT INTO #TestTable VALUES ('Ian')
select * from #TestTable as TestTable
INNER JOIN LiveTable on LiveTable.Reference=TestTable.Reference
Is it possible to create an index on #Test.TestField? The following webpage suggests it is not. However, I read on another webpage that it is possible.
I know I could create a physical table instead (for #TestTable). However, I want to see if I can do this with a derived table first.
You can create an index on a table variable as described in the top voted answer on this question:
SQL Server : Creating an index on a table variable
Sample syntax from that post:
DECLARE #TEMPTABLE TABLE (
[ID] [INT] NOT NULL PRIMARY KEY,
[Name] [NVARCHAR] (255) COLLATE DATABASE_DEFAULT NULL,
UNIQUE NONCLUSTERED ([Name], [ID])
)
Alternately, you may want to consider using a temp table, which will persist during the scope of the current operation, i.e. during execution of a stored procedure exactly like table variables. Temp tables will be structured and optimized just like regular tables, but they will be stored in tempDb, therefore they can be indexed in the same way as regular table.
Temp tables will generally offer better performance than table variables, but it's worth testing with your dataset.
More in depth details can be found here:
When should I use a table variable vs temporary table in sql server?
You can see a sample of creating a temp table with an index from:
SQL Server Planet - Create Index on Temp Table
One of the most valuable assets of a temp table (#temp) is the ability
to add either a clustered or non clustered index. Additionally, #temp
tables allow for the auto-generated statistics to be created against
them. This can help the optimizer when determining cardinality. Below
is an example of creating both a clustered and non-clustered index on
a temp table.
Sample code from site:
CREATE TABLE #Users
(
ID INT IDENTITY(1,1),
UserID INT,
UserName VARCHAR(50)
)
INSERT INTO #Users
(
UserID,
UserName
)
SELECT
UserID = u.UserID
,UserName = u.UserName
FROM dbo.Users u
CREATE CLUSTERED INDEX IDX_C_Users_UserID ON #Users(UserID)
CREATE INDEX IDX_Users_UserName ON #Users(UserName)

Very slow DELETE query

I have problems with SQL performance. For sudden reason the following queries are very slow:
I have two lists which contains Id's of a certain table. I need to delete all records from the first list if the Id's already exists in the second list:
DECLARE #IdList1 TABLE(Id INT)
DECLARE #IdList2 TABLE(Id INT)
-- Approach 1
DELETE list1
FROM #IdList1 list1
INNER JOIN #IdList2 list2 ON list1.Id = list2.Id
-- Approach 2
DELETE FROM #IdList1
WHERE Id IN (SELECT Id FROM #IdList2)
It is possible the two lists contains more than 10.000 records. In that case both queries takes each more than 20 seconds to execute.
The execution plan also showed something I don't understand. Maybe that explains why it is so slow:
I Filled both lists with 10.000 sequential integers so both list contained value 1-10.000 as starting point.
As you can see both queries shows for #IdList2 Actual Number of Rows is 50.005.000!!. #IdList1 is correct (Actual Number of Rows is 10.000)
I know there are other solutions how to solve this. Like filling a third list instaed of removing from first list. But my question is:
Why are these delete queries so slow and why do I see these strange query plans?
Add a Primary key to your table variables and watch them scream
DECLARE #IdList1 TABLE(Id INT primary Key not null)
DECLARE #IdList2 TABLE(Id INT primary Key not null)
because there's no index on these table variables, any joins or subqueries must examine on the order of 10,000 times 10,000 = 100,000,000 pairs of values.
SQL Server compiles the plan when the table variable is empty and does not recompile it when rows are added. Try
DELETE FROM #IdList1
WHERE Id IN (SELECT Id FROM #IdList2)
OPTION (RECOMPILE)
This will take account of the actual number of rows contained in the table variable and get rid of the nested loops plan
Of course creating an index on Id via a constraint may well be beneficial for other queries using the table variable too.
The tables in table variables can have primary keys, so if your data supports uniqueness for these Ids, you may be able to improve performance by going for
DECLARE #IdList1 TABLE(Id INT PRIMARY KEY)
DECLARE #IdList2 TABLE(Id INT PRIMARY KEY)
Possible solutions:
1) Try to create indices thus
1.1) If List{1|2}.Id column has unique values then you could define a unique clustered index using a PK constraint like this:
DECLARE #IdList1 TABLE(Id INT PRIMARY KEY);
DECLARE #IdList2 TABLE(Id INT PRIMARY KEY);
1.2) If List{1|2}.Id column may have duplicate values then you could define a unique clustered index using a PK constraint using a dummy IDENTITY column like this:
DECLARE #IdList1 TABLE(Id INT, DummyID INT IDENTITY, PRIMARY KEY (ID, DummyID) );
DECLARE #IdList2 TABLE(Id INT, DummyID INT IDENTITY, PRIMARY KEY (ID, DummyID) );
2) Try to add HASH JOIN query hint like this:
DELETE list1
FROM #IdList1 list1
INNER JOIN #IdList2 list2 ON list1.Id = list2.Id
OPTION (HASH JOIN);
You are using Table Variables, either add a primary key to the table or change them to Temporary Tables and add an INDEX. This will result in much more performance. As a rule of thumb, if the table is only small, use TABLE Variables, however if the table is expanding and contains a lot of data then either use a temp table.
I'd be tempted to try
DECLARE #IdList3 TABLE(Id INT);
INSERT #IdList3
SELECT Id FROM #IDList1 ORDER BY Id
EXCEPT
SELECT Id FROM #IDList2 ORDER BY Id
No deleting required.
Try this alternate syntax:
DELETE deleteAlias
FROM #IdList1 deleteAlias
WHERE EXISTS (
SELECT NULL
FROM #IdList2 innerList2Alias
WHERE innerList2Alias.id=deleteAlias.id
)
EDIT.....................
Try using #temp tables with indexes instead.
Here is a generic example where "DepartmentKey" is the PK and the FK.
IF OBJECT_ID('tempdb..#Department') IS NOT NULL
begin
drop table #Department
end
CREATE TABLE #Department
(
DepartmentKey int ,
DepartmentName varchar(12)
)
CREATE INDEX IX_TEMPTABLE_Department_DepartmentKey ON #Department (DepartmentKey)
IF OBJECT_ID('tempdb..#Employee') IS NOT NULL
begin
drop table #Employee
end
CREATE TABLE #Employee
(
EmployeeKey int ,
DepartmentKey int ,
SSN varchar(11)
)
CREATE INDEX IX_TEMPTABLE_Employee_DepartmentKey ON #Employee (DepartmentKey)
Delete deleteAlias
from #Department deleteAlias
where exists ( select null from #Employee innerE where innerE.DepartmentKey = deleteAlias.DepartmentKey )
IF OBJECT_ID('tempdb..#Employee') IS NOT NULL
begin
drop table #Employee
end
IF OBJECT_ID('tempdb..#Department') IS NOT NULL
begin
drop table #Department
end