Derived table with an index - sql

Please see the TSQL below:
DECLARE #TestTable table (reference int identity,
TestField varchar(10),
primary key (reference))
INSERT INTO #TestTable VALUES ('Ian')
select * from #TestTable as TestTable
INNER JOIN LiveTable on LiveTable.Reference=TestTable.Reference
Is it possible to create an index on #Test.TestField? The following webpage suggests it is not. However, I read on another webpage that it is possible.
I know I could create a physical table instead (for #TestTable). However, I want to see if I can do this with a derived table first.

You can create an index on a table variable as described in the top voted answer on this question:
SQL Server : Creating an index on a table variable
Sample syntax from that post:
DECLARE #TEMPTABLE TABLE (
[ID] [INT] NOT NULL PRIMARY KEY,
[Name] [NVARCHAR] (255) COLLATE DATABASE_DEFAULT NULL,
UNIQUE NONCLUSTERED ([Name], [ID])
)
Alternately, you may want to consider using a temp table, which will persist during the scope of the current operation, i.e. during execution of a stored procedure exactly like table variables. Temp tables will be structured and optimized just like regular tables, but they will be stored in tempDb, therefore they can be indexed in the same way as regular table.
Temp tables will generally offer better performance than table variables, but it's worth testing with your dataset.
More in depth details can be found here:
When should I use a table variable vs temporary table in sql server?
You can see a sample of creating a temp table with an index from:
SQL Server Planet - Create Index on Temp Table
One of the most valuable assets of a temp table (#temp) is the ability
to add either a clustered or non clustered index. Additionally, #temp
tables allow for the auto-generated statistics to be created against
them. This can help the optimizer when determining cardinality. Below
is an example of creating both a clustered and non-clustered index on
a temp table.
Sample code from site:
CREATE TABLE #Users
(
ID INT IDENTITY(1,1),
UserID INT,
UserName VARCHAR(50)
)
INSERT INTO #Users
(
UserID,
UserName
)
SELECT
UserID = u.UserID
,UserName = u.UserName
FROM dbo.Users u
CREATE CLUSTERED INDEX IDX_C_Users_UserID ON #Users(UserID)
CREATE INDEX IDX_Users_UserName ON #Users(UserName)

Related

SQL Full Text Index on multiple tables and columns

We have electronic forms that filers fill out online and we store the data in an SQL Server. We want to provide a search feature that allows us to search inside each electronic filing for matching keywords. We don’t need to know what word matched or where in the form it matches, we just need a ranked list of forms that match our keywords. We think SQL Full-Text Searching would be our best option because we are already using SQL server 2016. We just started with implementing a solution but would like some guidance since this is new territory for us.
Here is an example of how our tables are structured.
Filing is our top-level table for all electronic forms. We have sub tables that are all related through the FilingId. The Form Six Published Filings table has child tables to store information like Assets. The Form One Published Filings table has child tables to store information like Liabilities.
CREATE SCHEMA [Forms]
GO
CREATE SCHEMA [Form6]
GO
CREATE SCHEMA [Form1]
GO
CREATE TABLE [Forms].[Filing](
[FilingId] INT NOT NULL IDENTITY(1,1)
CONSTRAINT [PK_Forms_Filing_FilingId] PRIMARY KEY CLUSTERED,
[FilerUserId] [int] NOT NULL,
[FormYear] [int] NOT NULL,
[FormTypeId] [int] NOT NULL,
[FilingStatusId] [int] NOT NULL,
[FilerSignatureId] INT NULL,
[SubmissionDate] DATETIME2(0) NULL,
[IsScannedForm] BIT NOT NULL
CONSTRAINT [DF_Forms_Filing_IsScannedForm] DEFAULT(0)
)
GO
CREATE TABLE [Form6].[FormSixPublishedFilings](
[FormSixPublishedFilingId] INT NOT NULL IDENTITY(1,1)
CONSTRAINT [PK_Form6_FormSixPublishedFilings_FormSixPublishedFilingId] PRIMARY KEY CLUSTERED,
[FilingId] INT NOT NULL
CONSTRAINT [FK_Form6_FormSixPublishedFilings_Filings] FOREIGN KEY ([FilingId]) REFERENCES [Forms].[Filing] ([FilingId]),
[LastDateOfEmployment] DATE NULL,
[NetWorthDate] DATE NULL,
[NetWorth] MONEY NULL
)
GO
CREATE TABLE [Form6].[FormSixPublishedAssets](
[FormSixPublishedAssetId] INT NOT NULL IDENTITY(1,1)
CONSTRAINT [PK_Form6_FormSixPublishedAssets_FormSixPublishedAssetId] PRIMARY KEY CLUSTERED,
[FormSixPublishedFilingId] INT NOT NULL
CONSTRAINT [FK_Form6_FormSixPublishedAssets_FormSixPublishedFilings] FOREIGN KEY ([FormSixPublishedFilingId]) REFERENCES [Form6].[FormSixPublishedFilings] ([FormSixPublishedFilingId]),
[Name] VARCHAR(8000) NOT NULL,
[Amount] MONEY NOT NULL
)
GO
CREATE TABLE [Form1].[FormOnePublishedFilings]
(
[FormOnePublishedFilingId] INT NOT NULL IDENTITY(1,1)
CONSTRAINT [PK_Form1_FormOnePublishedFilings_FormOnePublishedFilingId] PRIMARY KEY CLUSTERED,
[FilingId] INT NOT NULL,
CONSTRAINT [FK_Form1_FormOnePublishedFilings_Filing] FOREIGN KEY ([FilingId]) REFERENCES [Forms].[Filing] ([FilingId]),
[HasServedAsAgent] BIT NULL,
[LastDateOfEmployment] DATE NULL,
[AmendmentReason] VARCHAR(1024) NULL,
)
GO
CREATE TABLE [Form1].[FormOnePublishedLiabilities]
(
[FormOnePublishedLiabilityId] INT NOT NULL IDENTITY(1,1)
CONSTRAINT [PK_Form1_FormOnePublishedLiabilities_FormOnePublishedLiabilityId] PRIMARY KEY CLUSTERED,
[FormOnePublishedFilingId] INT NOT NULL,
CONSTRAINT [FK_Form1_FormOnePublishedLiabilities_FormOnePublishedFilings] FOREIGN KEY ([FormOnePublishedFilingId]) REFERENCES [Form1].[FormOnePublishedFilings] ([FormOnePublishedFilingId]),
[NameOfCreditor] VARCHAR(8000) NOT NULL,
[AddressOfCreditor] VARCHAR(8000) NOT NULL
)
GO
In order to be able to search through all the forms, I think we need to create a view that just has two columns. One for the FilingId and the other column would be an XML data type which would be an XML representation of all the data in each electronic filing. This XML column is what we will be using to set up our full-text index. I think we will be using the FreeTextTable search because we would like to have the results ranked and also the search terms will be entered by end-users.
create view ViewForFullTextSearching with schemabinding as
select f.FilingId,
(select
filing.FilingId
,filing.FormYear
,filing.FormTypeId
,filing.FilingStatusId
,filing.FilerSignatureId
,filing.SubmissionDate
,filing.IsScannedForm
,form6Filing.LastDateOfEmployment 'Form6LastDateOfEmployment'
,form6Filing.NetWorthDate
,form6Filing.NetWorth
,form6Asset.Name
,form6Asset.Amount
,form1Filing.HasServedAsAgent
,form1Filing.LastDateOfEmployment 'Form1LastDateOfEmployment'
,form1Filing.AmendmentReason
,form1Liability.NameOfCreditor
,form1Liability.AddressOfCreditor
from Forms.Filing filing
left join Form6.FormSixPublishedFilings form6Filing on filing.FilingId = form6Filing.FilingId
left join Form6.FormSixPublishedAssets form6Asset on form6Filing.FormSixPublishedFilingId = form6Asset.FormSixPublishedFilingId
left join Form1.FormOnePublishedFilings form1Filing on filing.FilingId = form1Filing.FilingId
left join Form1.FormOnePublishedLiabilities form1Liability on form1Liability.FormOnePublishedFilingId = form1Filing.FormOnePublishedFilingId
where filing.FilingId = f.FilingId
for xml auto, type
) as 'Filing'
from Forms.Filing f
GO
create unique clustered index [IX_ViewForFullTextSearching_FilingId] ON [Forms].[ViewForFullTextSearching] ([FilingId])
GO
The above SQL does not actually work because I get this error.
Cannot create an index on view "EthicsFdms.Forms.ViewForFullTextSearching" because it contains one or more subqueries. Consider changing the view to use only joins instead of subqueries. Alternatively, consider not indexing this view.
So, I’m a bit lost on how to create a view with XML to search over if I’m not allowed to create a materialized view that has subqueries.
This view results look like this:
Next we setup our Full Text Catalog and Index on this view:
CREATE FULLTEXT CATALOG [FtcFilings];
GO
CREATE FULLTEXT INDEX ON [Forms].[ViewForFullTextSearching] ([Filing] language 1033) key index [IX_ViewForFullTextSearching_FilingId] on [FtcFilings];
GO
Then I was hoping we could search the filings like so:
select ftt.*
from [Forms].[Filing] filing
inner join freetextable(Forms.ViewForFullTextSearching, Filing, 'APPLE') as ftt on filing.FilingId = ftt.[KEY]
order by rank desc
Right now my challenges are, is it possible to create a materialized view like this? Seems like I can’t because materialized views can’t have subqueries. I’m not sure how to build the XML field w/out subqueries.
If I’m not able to create a materialized view then how else can I create a full-text index that can search electronic Forms?
You cannot create an indexed view (which is a synchronous materialized view in SQL Server) only if there is a mathematical surjection and all scalar computation is deterministic and precise. By the way OUTER JOIN, SUBQUERIES and set operators (UNION, EXCEPT, INTERSECT) cannot be used...
The best ways to design your systeme is to do it in the reverse way...
Create a persistent computed column using the CONCAT function of all the columns you want to fulltext index.
Create fulltext indexes on the computed columns
Create an UDF that search in the fulltext index on each tables and concatenate the result by UNION, and then aggregate results to compute the rank.
Let me know if you want more assistance to do so...
If these form filling data are seldom changed once created and it makes sense in business to store data of form1 and form6 together with its Filling, you may consider to go with document oriented design.
SQL server has good json support now. You can save all the Filling and form info in json, against which you can do full text search, and create views to simulate your current design if needed.
Here is an example -
create table tst.form (
form_id int not null identity primary key
,content_json nvarchar(max)
)
-- inside content_json, the json may look like -
{
"filler_user_id": 111,
"filler_type_id": 1,
"is_scanned_form": 1,
"form1": [
{
"form1_filling_id": 101,
"has_served_as_agent":0,
"liabilities": [{"name_of_creditor": "abc"}]
}
]
}
I only modelled form1 related info. You can add form6 related info as needed.
Then you can do full text search against this content_json column.
Then create views to simulate your current design if needed -
create or alter view tst.form_base WITH SCHEMABINDING as
select form_id
,convert(int, JSON_VALUE(content_json, '$.filler_user_id')) filler_user_id
,convert(int, JSON_VALUE(content_json, '$.filler_type_id')) filler_type_id
,convert(bit, JSON_VALUE(content_json, '$.is_scanned_form')) is_scanned_form
,JSON_QUERY(content_json, '$.form1') form1_json
from tst.form
create unique clustered index idx_form_base_form_id on tst.form_base(form_id);
-- you can create index as needed
create index idx_form_base_filler_user_id on tst.form_base(filler_user_id);
create or alter view tst.form1 as
select form_id
,a.form1_filling_id
,a.has_served_as_agent
,a.liabilities liabilities_json
from tst.form_base cross apply OPENJSON(form1_json) WITH (
form1_filling_id int '$.form1_filling_id',
has_served_as_agent int '$.has_served_as_agent',
liabilities nvarchar(max) '$.liabilities' as json) a
create or alter view tst.form1_liabilities as
select form_id
,form1_filling_id
,a.name_of_creditor
from tst.form1 cross apply OPENJSON(liabilities_json) WITH (
name_of_creditor nvarchar(max) '$.name_of_creditor') a
Then create some test data -
insert into tst.form (content_json) values ('{
"filler_user_id": 111,
"filler_type_id": 1,
"is_scanned_form": 1,
"form1": [
{
"form1_filling_id": 101,
"has_served_as_agent":0,
"liabilities": [{"name_of_creditor": "abc"}]
}
]
}');
insert into tst.form (content_json) values ('{
"filler_user_id": 222,
"filler_type_id": 1,
"is_scanned_form": 0,
"form1": [
{
"form1_filling_id": 102,
"has_served_as_agent":1,
"liabilities": [{"name_of_creditor": "def"}]
}
]
}');
Try it -
select *
from tst.form1_liabilities

Adding Columns to Multiple Tables in SQL

I just created a database and then added a couple of hundred tables with a script like this:
CREATE TABLE CapBond
(
[timestamp] varchar(50),
[Reward] varchar(50),
[Award] varchar(50),
[Fact] varchar(50)
)
CREATE TABLE Values
(
[timestamp] varchar(50),
[Name] varchar(50),
[Test] varchar(50),
[Read] varchar(50),
[Parameters] varchar(50)
)
I realize I forgot to add two columns to each table. One for the PK and one for an FK that points back to a 'master' table.
Is there an easy way to insert columns without having to drop the DB and recreate it? Preferably with the columns inserted as the first two columns in the table?
Yes. In mysql you have the alter table command for this purpose. Check out this page for more detailed explanation
https://www.sqlservertutorial.net/sql-server-basics/sql-server-alter-table-add-column/ .
And here is the solution for the ordering of the columns
https://www.mysqltutorial.org/mysql-add-column/

Sql Server : index already existing on global temp table

In Sql Server 2012 SP3 v.11.0.6020.0 (X64), I have a stored procedure which tests for the existence of a global temporary table (##MyTable , e.g.) and creates it - if not found, of course.
IF OBJECT_ID ( 'tempdb..##MyTable' ) IS NULL
CREATE TABLE ##MyTable
(
Key1 smallint
, Key2 nvarchar(16)
, Value1 char(3)
);
Later in the procedure, it tests if the table has rows and - if necessary - populates it.
IF NOT EXISTS ( SELECT * FROM ##MyTable )
BEGIN
INSERT INTO ##MyTable
SELECT Key1, Key2, Value1
FROM SourceTable
WHERE ...
CREATE NONCLUSTERED INDEX IX_MyTable ON ##MyTable ( [Key1], [Key2] );
END
I am sure that Key1 and Key2 are unique, since they are primary keys on source table.
Then, in either case (the table already existed or not) the stored procedures queries the table. Needlesss to say, the sp logic is much more complex than this.
The table is populated with customers' data coming from 7 different sources; usually, it takes a couple of seconds to insert almost 1 million of rows. Theoretically there is no chance that the all of the INSERT INTO ##MyTable insert 0 (ZERO) rows.
The stored procedure is called by an application: this application is usually launched in the morning and closed at night.
Theoretically, there could be conflicts, a user tries to insert data and create the index while another one was already doing the same. But it's very unlikely that it happens always to the same user; it should be impossible, if that user tries again a few minutes later (table and index already exists).
This works fine for all of the user (near 100), but a specific one, who keeps getting an error: The operation failed because an index or statistics with name 'IX_MyTable' already exists on table ##MyTable.
Aside from the fact that I'm already thinking about making the global temporary table a regular one, could anyone please explain to me this behaviour?
Thanks in advance to anyone who will help!
You would experience this behavior under these circumstances:
You create the table. It is empty.
The insert query is run, but inserts no rows.
The index is created.
On the next run, you will have an empty table and attempt the insert once again.
This is easy enough to work around. Just use a try/catch block or test to see if the index exists before creating it. Or, better yet, create the index when you create the table. Unless you are inserting a lot of data, the overhead shouldn't be too bad.
Better if you move your Create Index statement in Table creation block itself..
IF OBJECT_ID ( 'tempdb..##MyTable' ) IS NULL
BEGIN
CREATE TABLE ##MyTable
(
Key1 smallint
, Key2 nvarchar(16)
, Value1 char(3)
);
CREATE NONCLUSTERED INDEX IX_MyTable ON ##MyTable ( [Key1], [Key2] );
END
The Error explains itself, you must create the index while creating the table, you cant create an index when it already exists
IF OBJECT_ID ( 'tempdb..##MyTable' ) IS NULL
Begin
CREATE TABLE ##MyTable
(
Key1 smallint
, Key2 nvarchar(16)
, Value1 char(3)
);
CREATE NONCLUSTERED INDEX IX_MyTable ON ##MyTable ( [Key1], [Key2] );
END
in your Stored Procedure you are creating index inside condition when the table is empty, so every time you delete all data from table, it is trying to create index.

Can We Create Index in Table Valued Function

Can we create an index on a column in a table valued functions table in SQL Server 2008?
My function is getting slow results. When I look into the execution plan, it was under the table scan, hence I need to create index on function table column so that put where clause on that.
Any help would be highly appreciated.
Thanks in advance
If the table valued function is of the inline variety you would create the index on the underlying table columns.
If it is a multi statement TVF in SQL Server 2008 (as tagged) you can only create the indexes associated with primary key or unique constraints.
In SQL Server 2014+ it is possible to declare inline indexes not associated with any constraint.
Example
CREATE FUNCTION F()
RETURNS #X TABLE
(
A INT PRIMARY KEY /*<-- Implicit clustered index*/
)
AS
BEGIN
INSERT INTO #X
VALUES(1),(2)
RETURN;
END
GO
SELECT *
FROM F()
WHERE A = 12
The above materializes the entire resultset up front into a table variable first, and creates an implicit index on it.
Generally inline TVFs are preferred to multi statement ones.
CREATE OR ALTER FUNCTION dbo.tvfExample()
RETURNS #Example TABLE
(
Field1_ID INT NOT NULL,
Field2_ID INT NOT NULL,
Field3_ID INT NOT NULL,
PRIMARY KEY CLUSTERED (Field1_ID ASC, Field2_ID ASC, Field3_ID ASC)
)
AS
BEGIN
...
RETURN
END
GO

Very slow DELETE query

I have problems with SQL performance. For sudden reason the following queries are very slow:
I have two lists which contains Id's of a certain table. I need to delete all records from the first list if the Id's already exists in the second list:
DECLARE #IdList1 TABLE(Id INT)
DECLARE #IdList2 TABLE(Id INT)
-- Approach 1
DELETE list1
FROM #IdList1 list1
INNER JOIN #IdList2 list2 ON list1.Id = list2.Id
-- Approach 2
DELETE FROM #IdList1
WHERE Id IN (SELECT Id FROM #IdList2)
It is possible the two lists contains more than 10.000 records. In that case both queries takes each more than 20 seconds to execute.
The execution plan also showed something I don't understand. Maybe that explains why it is so slow:
I Filled both lists with 10.000 sequential integers so both list contained value 1-10.000 as starting point.
As you can see both queries shows for #IdList2 Actual Number of Rows is 50.005.000!!. #IdList1 is correct (Actual Number of Rows is 10.000)
I know there are other solutions how to solve this. Like filling a third list instaed of removing from first list. But my question is:
Why are these delete queries so slow and why do I see these strange query plans?
Add a Primary key to your table variables and watch them scream
DECLARE #IdList1 TABLE(Id INT primary Key not null)
DECLARE #IdList2 TABLE(Id INT primary Key not null)
because there's no index on these table variables, any joins or subqueries must examine on the order of 10,000 times 10,000 = 100,000,000 pairs of values.
SQL Server compiles the plan when the table variable is empty and does not recompile it when rows are added. Try
DELETE FROM #IdList1
WHERE Id IN (SELECT Id FROM #IdList2)
OPTION (RECOMPILE)
This will take account of the actual number of rows contained in the table variable and get rid of the nested loops plan
Of course creating an index on Id via a constraint may well be beneficial for other queries using the table variable too.
The tables in table variables can have primary keys, so if your data supports uniqueness for these Ids, you may be able to improve performance by going for
DECLARE #IdList1 TABLE(Id INT PRIMARY KEY)
DECLARE #IdList2 TABLE(Id INT PRIMARY KEY)
Possible solutions:
1) Try to create indices thus
1.1) If List{1|2}.Id column has unique values then you could define a unique clustered index using a PK constraint like this:
DECLARE #IdList1 TABLE(Id INT PRIMARY KEY);
DECLARE #IdList2 TABLE(Id INT PRIMARY KEY);
1.2) If List{1|2}.Id column may have duplicate values then you could define a unique clustered index using a PK constraint using a dummy IDENTITY column like this:
DECLARE #IdList1 TABLE(Id INT, DummyID INT IDENTITY, PRIMARY KEY (ID, DummyID) );
DECLARE #IdList2 TABLE(Id INT, DummyID INT IDENTITY, PRIMARY KEY (ID, DummyID) );
2) Try to add HASH JOIN query hint like this:
DELETE list1
FROM #IdList1 list1
INNER JOIN #IdList2 list2 ON list1.Id = list2.Id
OPTION (HASH JOIN);
You are using Table Variables, either add a primary key to the table or change them to Temporary Tables and add an INDEX. This will result in much more performance. As a rule of thumb, if the table is only small, use TABLE Variables, however if the table is expanding and contains a lot of data then either use a temp table.
I'd be tempted to try
DECLARE #IdList3 TABLE(Id INT);
INSERT #IdList3
SELECT Id FROM #IDList1 ORDER BY Id
EXCEPT
SELECT Id FROM #IDList2 ORDER BY Id
No deleting required.
Try this alternate syntax:
DELETE deleteAlias
FROM #IdList1 deleteAlias
WHERE EXISTS (
SELECT NULL
FROM #IdList2 innerList2Alias
WHERE innerList2Alias.id=deleteAlias.id
)
EDIT.....................
Try using #temp tables with indexes instead.
Here is a generic example where "DepartmentKey" is the PK and the FK.
IF OBJECT_ID('tempdb..#Department') IS NOT NULL
begin
drop table #Department
end
CREATE TABLE #Department
(
DepartmentKey int ,
DepartmentName varchar(12)
)
CREATE INDEX IX_TEMPTABLE_Department_DepartmentKey ON #Department (DepartmentKey)
IF OBJECT_ID('tempdb..#Employee') IS NOT NULL
begin
drop table #Employee
end
CREATE TABLE #Employee
(
EmployeeKey int ,
DepartmentKey int ,
SSN varchar(11)
)
CREATE INDEX IX_TEMPTABLE_Employee_DepartmentKey ON #Employee (DepartmentKey)
Delete deleteAlias
from #Department deleteAlias
where exists ( select null from #Employee innerE where innerE.DepartmentKey = deleteAlias.DepartmentKey )
IF OBJECT_ID('tempdb..#Employee') IS NOT NULL
begin
drop table #Employee
end
IF OBJECT_ID('tempdb..#Department') IS NOT NULL
begin
drop table #Department
end