I wonder why the following SELECT statement (below) does not use Index Seek, but Index Scan. Is it just because the number of rows is too small or am I missing something?
Test data:
-- Init Tables
IF OBJECT_ID ( 'tempdb..#wat' ) IS NOT NULL
DROP TABLE #wat;
IF OBJECT_ID ( 'tempdb..#jam' ) IS NOT NULL
DROP TABLE #jam;
CREATE TABLE #wat (
ID INT IDENTITY(1,1) NOT NULL,
Name VARCHAR(15) NOT NULL,
Den DATETIME NOT NULL
)
CREATE TABLE #jam (
ID INT IDENTITY(1,1) NOT NULL,
Name VARCHAR(15) NOT NULL
)
-- Populate Temp Tables with Random Data
DECLARE #length INT
,#charpool VARCHAR(255)
,#poolLength INT
,#RandomString VARCHAR(255)
,#LoopCount INT
SET #Length = RAND() * 5 + 8
SET #CharPool = 'abcdefghijkmnopqrstuvwxyzABCDEFGHIJKLMNPQRSTUVWXYZ23456789'
SET #PoolLength = LEN(#CharPool)
SET #LoopCount = 0
SET #RandomString = ''
WHILE (#LoopCount < 500)
BEGIN
INSERT INTO #jam (Name)
SELECT SUBSTRING(#Charpool, CONVERT(int, RAND() * #PoolLength), 5)
SET #LoopCount = #LoopCount + 1
END
-- Insert Rows into Second Temp Table
INSERT INTO #wat( Name, Den )
SELECT TOP 50 Name, GETDATE()
FROM #jam
-- Create Indexes
--DROP INDEX IX_jedna ON #jam
--DROP INDEX IX_dva ON #wat
CREATE INDEX IX_jedna ON #jam (Name) INCLUDE (ID);
CREATE INDEX IX_dva ON #wat (Name) INCLUDE (ID, Den);
-- Select
SELECT *
FROM #jam j
JOIN #wat w
ON w.Name = j.Name
Execution Plan:
There are several ways for optimiser to do jons: nested loops, hash match or merge join (your case) and may be another.
In dependence of your data: count of rows, existed indexes and statistics id decides which one is better.
in your example optimiser assumes that there is many-to-many relation. And you have both tables soret(indexed) by this fields.
why merge join? - it is logically - to move through both tables parallel. And server will have to do that only once.
To make seek as you want, the server have to move thorugh first table once, and have to make seeks in second table a lot of times, since all records have matches in another table. Server will read all records if he seeks. And there no profit when using seek (1000 seeks even more difucult than one simple loop through 1000 records).
if you want seek add some records with no matches and where clause in your query.
UPD
even adding simple
where j.ID = 1
gives you seek
Related
There is a SQL Server database temporary table, let it be TableA. And the table structure is following:
CREATE TABLE #TableA
(
ID BIGINT IDENTITY (1, 1) PRIMARY KEY,
MapVal1 BIGINT NOT NULL,
MapVal2 BIGINT NOT NULL,
IsActual BIT NULL
)
The table is already filled with some mappings of MapVal1 to MapVal2. The issue is that not all the mappings should be flagged as Actual. For this reason should be used IsActual column. Currently IsActual is set to NULL for every row. The task is to create the query for updating IsActual column value. UPDATE query should follow next conditions:
If MapVal1 is unique and MapVal2 is unique (one-to-one mapping) - then this mapping should be flagged as Actual, so IsActual = 1;
If MapVal1 is not unique - then Actual should be the mapping of current MapVal1 to smallest MapVal2, and this MapVal2 must be not mapped to any other MapVal1 that is smaller than current MapVal1;
If MapVal2 is not unique - then Actual should be the mapping of current MapVal2 to smallest MapVal1, and this MapVal1 must be not mapped to any other MapVal2 that is smaller than current MapVal2;
All rows that are not fulfill any of 1), 2) or 3) conditions - should be flagged as inactual, so IsActual = 0.
I believe there is relation between Condition 2) and Condition 3). For every row they both are fulfilled or both are not.
To make it clear, here is an example of result I want to obtain:
Result should be that every MapVal1 is mapped to just one MapVal2 and vice varsa every MapVal2 is mapped to just one MapVal1.
I have created sql-query to resolve my task:
IF OBJECT_ID('tempdb..#TableA') IS NOT NULL
BEGIN
DROP TABLE #TableA
END
CREATE TABLE #TableA
(
ID BIGINT IDENTITY (1, 1) PRIMARY KEY,
MapVal1 BIGINT NOT NULL,
MapVal2 BIGINT NOT NULL,
IsActual BIT NULL
)
-- insert input data
INSERT INTO #TableA (MapVal1, MapVal2)
SELECT 1, 1
UNION ALL SELECT 1, 3
UNION ALL SELECT 1, 4
UNION ALL SELECT 2, 1
UNION ALL SELECT 2, 3
UNION ALL SELECT 2, 4
UNION ALL SELECT 3, 3
UNION ALL SELECT 3, 4
UNION ALL SELECT 4, 3
UNION ALL SELECT 4, 4
UNION ALL SELECT 6, 7
UNION ALL SELECT 7, 8
UNION ALL SELECT 7, 9
UNION ALL SELECT 8, 8
UNION ALL SELECT 8, 9
UNION ALL SELECT 9, 8
UNION ALL SELECT 9, 9
CREATE NONCLUSTERED INDEX IX_Mapping_MapVal1 ON #TableA (MapVal1);
CREATE NONCLUSTERED INDEX IX_Mapping_MapVal2 ON #TableA (MapVal2);
-- UPDATE of #TableA is starting here
-- every one-to-one mapping should be actual
UPDATE m1 SET
m1.IsActual = 1
FROM #TableA m1
LEFT JOIN #TableA m2
ON m1.MapVal1 = m2.MapVal1 AND m1.ID <> m2.ID
LEFT JOIN #TableA m3
ON m1.MapVal2 = m3.MapVal2 AND m1.ID <> m3.ID
WHERE m2.ID IS NULL AND m3.ID IS NULL
-- update for every one-to-many or many-to-many mapping is more complicated
-- would be great to change this part of query to make it witout any LOOP
DECLARE #MapVal1 BIGINT
DECLARE #MapVal2 BIGINT
DECLARE #i BIGINT
DECLARE #iMax BIGINT
DECLARE #LoopCount INT = 0
SELECT
#iMax = MAX (m.ID)
FROM #TableA m
SELECT
#i = MIN (m.ID)
FROM #TableA m
WHERE m.IsActual IS NULL
WHILE #i <= #iMax
BEGIN
SELECT #LoopCount = #LoopCount + 1
SELECT
#MapVal1 = m.MapVal1,
#MapVal2 = m.MapVal2
FROM #TableA m
WHERE m.ID = #i
IF EXISTS
(
SELECT *
FROM #TableA m
WHERE
m.ID < #i
AND
(m.MapVal1 = #MapVal1
OR m.MapVal2 = #MapVal2)
AND m.IsActual IS NULL
)
BEGIN
UPDATE m SET
m.IsActual = 0
FROM #TableA m
WHERE m.ID = #i
END
SELECT #i = MIN (m.ID)
FROM #TableA m
WHERE
m.ID > #i
AND m.IsActual IS NULL
END
UPDATE m SET
m.IsActual = 1
FROM #TableA m
WHERE m.IsActual IS NULL
SELECT * FROM #TableA
but as it was expected performance of the query with LOOP is very bad, specially when input table keep millions of rows. I spent a lot of time trying to produce query without LOOP to get reduce execution time of my query but unsuccessfully.
Could anybody advice me how to improve performance of my query. It would be great to get query without LOOP.
Using a loop does not imply you need to update the table one record at a time.
It may help if each individual UPDATE statement updates multiple records.
Consider all possible combinations of MapVal1 and MapVal2 as a matrix.
Every time you flag a cell as 'actual', you can flag an entire row and an entire column as 'not actual'.
The simplest way to do this, is by following these steps.
Of all mappings with IsActual = NULL, take the first one (smallest MapVal1, together with the smallest MapVal2 it is mapped to).
Flag this mapping as actual (IsActual = 1).
Flag all other mappings with the same MapVal1 as non-actual (IsActual = 0).
Flag all other mappings with the same MapVal2 as non-actual (IsActual = 0).
Repeat from step 1 until no more records with IsActual = NULL exist.
Here's an implementation:
SELECT 0 -- force ##ROWCOUNT initially 1
WHILE ##ROWCOUNT > 0
WITH MakeActual AS (
SELECT TOP 1 MapVal1, MapVal2
FROM #TableA
WHERE IsActual IS NULL
ORDER BY MapVal1, MapVal2
)
UPDATE a
SET IsActual = CASE WHEN a.MapVal1 = m.MapVal1 AND a.MapVal2 = m.MapVal2 THEN 1 ELSE 0 END
FROM #TableA a
INNER JOIN MakeActual m ON a.MapVal1 = m.MapVal1 OR a.MapVal2 = m.MapVal2
The number of loop iterations equals the number of 'actual' mappings.
The actual performance gain depends a lot on the data.
If the majority of mappings is one-to-one (i.e. hardly any non-actual mappings), then my algorithm will make little difference.
Therefore, it may be wise to keep the initial UPDATE statement from your own code sample (the one with the comment "every one-to-one mapping should be actual").
It may also help to play around with the indexes.
This one should help to further optimize the clause ORDER BY MapVal1, MapVal2:
CREATE NONCLUSTERED INDEX IX_MapVals ON #TableA (MapVal1, MapVal2)
I have a very long line of random numbers, e.g. 234,364...,632. I want to insert this line into a SQL temp table so that I can use it in an IN (SELECT * FROM #MYTABLE) statement of various queries. How can I do that? Apparently there is not an easy way to insert this list into a column in a table. I was thinking to insert it as a row and then pivot the table. Any solution please?
If you're using a sufficiently recent version of SQL Server (2016 or later), STRING_SPLIT could be useful:
CREATE TABLE #MyTempTable (RandomNumber int)
INSERT INTO #MyTempTable
SELECT
ss.Value
FROM
STRING_SPLIT('1234,5678',',') ss
SELECT
tt.RandomNumber
FROM
#MyTempTable tt
One way to handle this is to perform what I call an "Indexed Split".
For that you would need a tally (AKA numbers) table.
SET NOCOUNT ON;
USE tempdb;
GO
--==== 1. Numbers table setup
-- Create the table
IF OBJECT_ID('dbo.tally') IS NOT NULL DROP TABLE dbo.tally;
CREATE TABLE dbo.tally (N INT NOT NULL);
-- Add Primary Key (I do it here so that I can name it)
ALTER TABLE dbo.tally
ADD CONSTRAINT pk_cl_tally PRIMARY KEY CLUSTERED(N)
WITH FILLFACTOR=100;
-- Add a Unique Index (the optimizer will pick this one)
ALTER TABLE dbo.tally
ADD CONSTRAINT uq_tally UNIQUE NONCLUSTERED(N);
-- Add rows (100K should do)
INSERT dbo.tally
SELECT TOP(100000) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM sys.all_columns, sys.all_columns a
Next for some sample data to be used for the split.
--==== 2. Sample data
CREATE TABLE dbo.Test (ID INT, Column1 VARCHAR(20));
INSERT INTO dbo.Test (ID,Column1)
VALUES (1,'ORM;PS;SUP'),(2,'ABC;XYZ;999;123');
Now that we have our data, let's use our tally table to "pre-split" the string:
--==== 3. Indexed view to perform the "split"
-- The view
CREATE OR ALTER VIEW dbo.TestSplit WITH SCHEMABINDING AS
SELECT
Id = t.ID,
item =
SUBSTRING
(
t.Column1,
tt.N+SIGN(tt.N-1),
ISNULL(NULLIF((CHARINDEX(';',t.Column1,tt.N+1)),0),LEN(t.Column1)+1)-(tt.N)-SIGN(tt.N-1)
),
ItemIndex = tt.N+1
FROM dbo.Test AS t
CROSS JOIN dbo.tally AS tt
WHERE tt.N <= LEN(t.Column1)
AND (tt.N = 1 OR SUBSTRING(t.column1,tt.N,1) = ';');
GO
-- The index
CREATE UNIQUE CLUSTERED INDEX uq_cl__testSplit ON dbo.TestSplit(Id,Item);
GO
Now you have the strings in the table, unchanged while the indexed view gives you a properly normalized view with your values "pre-split" and ready for retrieval.
--==== 4. Test and review execution plan
SELECT sv.*
FROM dbo.TestSplit AS sv;
Results:
Id item ItemIndex
----------- -------------------- -----------
1 ORM 2
1 PS 5
1 SUP 8
2 123 13
2 999 9
2 ABC 2
2 XYZ 5
Note the execution plan:
Lean and clean.
If there is a two column table MyTable with enough records that optimization of queries is relevant.
CorporationID int (unindexed)
BatchID int (indexed)
And lets assume there is always a 1 to many relationship between CorporationID and BatchID. In other words for each BatchID there will be only one CorporationID, but for each CorporationID there will be many BatchID values.
We need to get all BatchID values where corporationID = 1.
I know the simplest solution may be to just add an index to CorporationID, but assuming that is not allowed, is there some other way to inform SQL that each BatchID corresponds to only 1 CorporationID, through a query or otherwise?
select distinct batchid from MyTable where corporationID = 1
It seems this is not effective.
select batchid from (select min(corporationid) corporationid, batchid
from MyTable group by batchid) subselect where corporationid = 1
This is also not effective, I assume due to SQL needing to iterate needlessly through all values of corporationid? (Does an aggregate function exist to select any() value which would not have the overhead of min(), max(), sum() or avg()??)
select batchid
from (
select corporationid, batchid
from (
select *, ROW_NUMBER() OVER (PARTITION BY batchid ORDER BY(SELECT NULL)) AS RowNumber
from mytable
) subselect
where RowNumber = 1
) subselect2
where corporationid = 1
Would this work? By arbitrarily selecting the corporationid related to row number 1 after partitioning by batchid with no order?
"assuming it is not allowed to create an index" - this is a highly unlikely assumption. Of course, you should create the index.
The most direct answer to your alternate questions that lie within your question is "no". There is no function or sub query or view or other "read" action you can make to get a list of the batches for a given CorpID. You NEED to access the corpID data to do that... all your sample queries do not work because, at some point, they NEED to access the CorpIDs to know which rows to gather for BatchIDs. Any summary or "rollup" function that might exist would still NEED to access all the pages of data to "see" them. The reading of the pages cannot be avoided.
Without changes to your architecture, it's not physically possible to optimize your query further.
However, with some changes, you could have some options (but Id guess they are much uglier than just adding the index). For instance, you could modify the structure of your BatchID to include data for both the BatchID and the CorpID. Something like "8888899999999"... the 9's are the batchID and the 8's are the CorpID. This doesn't win you much though, you're not saving any index space, but at least you dont have to index the CorpID field :) Somethings like this could be done, but I wont share any others. I dont want the really experienced people here to see this stuff and get ill. :)
You need an index on CorpID if you want to improve performance.
If you don't have a lot of data, I suggest putting an index on the Corporation ID column. But if you have too much data, you can define an index for each Corporation ID
Part 01=>
/*01Create DB*/
IF DB_ID('Test01')>0
BEGIN
ALTER DATABASE Test01 SET SINGLE_USER WITH ROLLBACK IMMEDIATE
DROP DATABASE Test01
END
GO
CREATE DATABASE Test01
GO
USE Test01
Go
Part 02=>
/*02Create table*/
CREATE TABLE Table01(
ID INT PRIMARY KEY IDENTITY,
Title NVARCHAR(100),
CreationDate DATETIME,
CorporationID INT ,
MyID INT ,
[GuidId1] [uniqueidentifier] NOT NULL,
[GuidId2] [uniqueidentifier] NOT NULL,
[Code] [nvarchar](50) NULL
)
ALTER TABLE [dbo].[Table01] ADD DEFAULT (GETDATE()) FOR [CreationDate]
GO
ALTER TABLE [dbo].[Table01] ADD DEFAULT (NEWSEQUENTIALID()) FOR [GuidId1]
GO
ALTER TABLE [dbo].[Table01] ADD DEFAULT (NEWID()) FOR [GuidId2]
GO
CREATE TABLE Table02(
ID INT PRIMARY KEY IDENTITY,
Title NVARCHAR(100),
CreationDate DATETIME,
CorporationID INT ,
MyID INT ,
[GuidId1] [uniqueidentifier] NOT NULL,
[GuidId2] [uniqueidentifier] NOT NULL,
[Code] [nvarchar](50) NULL
)
ALTER TABLE [dbo].[Table02] ADD DEFAULT (GETDATE()) FOR [CreationDate]
GO
ALTER TABLE [dbo].[Table02] ADD DEFAULT (NEWSEQUENTIALID()) FOR [GuidId1]
GO
ALTER TABLE [dbo].[Table02] ADD DEFAULT (NEWID()) FOR [GuidId2]
GO
Part 03=>
/*03Add Data*/
DECLARE #I INT = 1
WHILE #I < 1000000
BEGIN
DECLARE #Title NVARCHAR(100) = 'TITLE '+ CAST(#I AS NVARCHAR(10)),
#CorporationID INT = CAST((RAND()*20) + 1 AS INT),
#Code NVARCHAR(50) = 'CODE '+ CAST(#I AS NVARCHAR(10)) ,
#MyID INT = CAST((RAND()*50) + 1 AS INT)
INSERT INTO Table01 (Title , CorporationID , Code , MyID )
VALUES ( #Title , #CorporationID , 'CODE '+ #Code , #MyID)
SET #I += 1
END
INSERT INTO Table02 ([Title], [CreationDate], [CorporationID], [MyID], [GuidId1], [GuidId2], [Code])
SELECT [Title], [CreationDate], [CorporationID], [MyID], [GuidId1], [GuidId2], [Code] FROM Table01
Part 04=>
/*04 CREATE INDEX*/
CREATE NONCLUSTERED INDEX IX_Table01_ALL
ON Table01 (CorporationID) INCLUDE (MyID) ;
DECLARE #QUERY NVARCHAR(MAX) = ''
DECLARE #J INT = 1
WHILE #J < 21
BEGIN
SET #QUERY += '
CREATE NONCLUSTERED INDEX IX_Table02_'+CAST(#J AS NVARCHAR(5))+'
ON Table02 (CorporationID) INCLUDE (MyID) WHERE CorporationID = '+CAST(#J AS NVARCHAR(5))+';'
SET #J+= 1
END
EXEC (#QUERY)
Part 05=>
/*05 READ DATA => PUSH Button CTRL + M ( EXECUTION PLAN) */
SET STATISTICS IO ON
SET STATISTICS TIME ON
SELECT * FROM [dbo].[Table01] WHERE CorporationID = 10 AND MyID = 25
SELECT * FROM [dbo].[Table01] WITH(INDEX(IX_Table01_ALL)) WHERE CorporationID = 10 AND MyID = 25
SELECT * FROM [dbo].[Table02] WITH(INDEX(IX_Table02_10)) WHERE CorporationID = 10 AND MyID = 25
SET STATISTICS IO OFF
SET STATISTICS TIME OFF
Notice IO , TIME , and EXECUTION PLAN .
Good luck
-- create table
CREATE TABLE dbo.Tests
(
Id BIGINT NOT NULL IDENTITY,
String NVARCHAR(100),
StringReversed AS REVERSE(String),
CONSTRAINT PK_Tests PRIMARY KEY (Id),
)
CREATE NONCLUSTERED INDEX IX1 ON dbo.Tests(String)
CREATE NONCLUSTERED INDEX IX2 ON dbo.Tests(StringReversed)
-- populate table with 100k random strings (for testing)
DECLARE #I INT = 100000
WHILE #I > 0
BEGIN
INSERT INTO Tests(String)
SELECT CONVERT(varchar(36), NEWID())
SET #I = #I - 1
END
-- how do i do a LIKE '%STRING%' search which uses the index?
SELECT String, StringReversed FROM Tests WHERE String LIKE '%0A7EB%'
SELECT String, StringReversed FROM Tests
WHERE String LIKE '0A7EB%' OR StringReversed LIKE 'BE7A0%'
Can you help me with this? I am trying to implement a full-text-search alternative to be able to do a LIKE '%STRING%' WHERE clause.
I'm stuck not quite sure if this is even possible to implement? Let's just assume that FULL TEXT SEARCH is not possible, and I need to use index.
This is a prod issue and we need to do a LIKE '%search%' in the string column. I just read here: SQL Server: Index columns used in like? that we can do a reverse?
Hope you can help me, thanks a lot.
Update: I tried the string fragments approach.
-- create table
CREATE TABLE dbo.Tests
(
Id BIGINT NOT NULL IDENTITY,
String NVARCHAR(100),
CONSTRAINT PK_Tests PRIMARY KEY (Id),
)
GO
-- create table for Test String Fragments
CREATE TABLE dbo.TestStringFragments(
Id BIGINT NOT NULL IDENTITY,
TestId BIGINT NOT NULL,
Fragment NVARCHAR(100),
CONSTRAINT PK_TestStringFragments PRIMARY KEY (Id)
)
CREATE NONCLUSTERED INDEX IX_TestStringFragments_Fragment ON dbo.TestStringFragments(Fragment)
CREATE NONCLUSTERED INDEX IX_TestStringFragments_TestId ON dbo.TestStringFragments(TestId)
GO
-- create UDF to generate string fragments
CREATE FUNCTION dbo.CreateStringFragments(#input nvarchar(100))
RETURNS TABLE WITH SCHEMABINDING
AS
RETURN
(
WITH x(x) AS
(SELECT 1 UNION ALL SELECT x+1 FROM x WHERE x < (LEN(#input)))
SELECT Fragment = SUBSTRING(#input, x, LEN(#input)) FROM x
)
GO
-- create trigger for the Tests table
CREATE TRIGGER dbo.Tests_MaintainStringFragments
ON dbo.Tests
FOR INSERT, UPDATE, DELETE
AS
BEGIN
SET NOCOUNT ON
DELETE TSF FROM dbo.TestStringFragments AS TSF
INNER JOIN deleted ON TSF.TestId = deleted.Id
INSERT dbo.TestStringFragments(TestId, Fragment)
SELECT inserted.Id, fragments.Fragment
FROM inserted
CROSS APPLY dbo.CreateStringFragments(inserted.String) AS fragments
END
GO
-- populate table with 100k random strings (for testing)
DECLARE #I INT = 100000
WHILE #I > 0
BEGIN
INSERT INTO Tests(String)
SELECT CONVERT(varchar(36), NEWID())
SET #I = #I - 1
END
Able to replicate the LIKE '%string%' code via the fragments table.
SELECT T.* FROM Tests T WITH(NOLOCK)
WHERE T.String LIKE '%CBB2%'
SELECT T.* FROM Tests T WITH(NOLOCK)
INNER JOIN TestStringFragments TSF WITH(NOLOCK) ON T.Id = TSF.TestId
where TSF.Fragment LIKE 'CBB2%'
My new execution plan is 61% for the first query, 39% for the second one. I'll check the trigrams approach.
Long story short, I would like to create a column that repeats the pattern 1,2,3,4,5,6,7,8,9,10,11,12,1,2,3,4,...etc. for (12 * 460343 =) 5524116 rows. Any wisdom on how I could complete this? Thank you!
Insert say 48 then select into from self several times. You will get there real fast. It is surprisingly faster than one would think.
If you create a table with an int autoinc column then at end:
delete from table where id>5524116
Edit here you go
create table idFix
( id bigint auto_increment primary key,
num int not null
)engine=myisam;
-- prime it
insert into idFix(num) values (1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12);
-- this is pretty fast, don't laugh
-- run the following line 19 times
insert into idFix(num) select num from idFix;
-- you now have 6.2m rows (6,291,456)
select count(*) from idFix
delete from idFix where id>5524116;
select count(*) from idFix;
select min(num),max(num) from idFix;
Takes 3 minutes max
Use your helper table then for the love of Pete drop it !
Use a loop and do some mod division with your counter.
DECLARE #LoopCounter bigint
SET #LoopCounter = 0
CREATE TABLE #YourValues
(
YourValue_Key int NOT NULL identity (1,1) PRIMARY KEY,
YourValue_OneThrough12Repating int
)
WHILE #LoopCounter < 5524116
BEGIN
INSERT INTO #YourValues (YourValue_OneThrough12Repating) VALUES ((#LoopCounter % 12) + 1)
SET #LoopCounter = #LoopCounter + 1
END
SELECT * FROM #YourValues
DROP TABLE #YourValues