Select LIKE '%string%', without FULL-TEXT-SEARCH in SQL Server - sql

-- create table
CREATE TABLE dbo.Tests
(
Id BIGINT NOT NULL IDENTITY,
String NVARCHAR(100),
StringReversed AS REVERSE(String),
CONSTRAINT PK_Tests PRIMARY KEY (Id),
)
CREATE NONCLUSTERED INDEX IX1 ON dbo.Tests(String)
CREATE NONCLUSTERED INDEX IX2 ON dbo.Tests(StringReversed)
-- populate table with 100k random strings (for testing)
DECLARE #I INT = 100000
WHILE #I > 0
BEGIN
INSERT INTO Tests(String)
SELECT CONVERT(varchar(36), NEWID())
SET #I = #I - 1
END
-- how do i do a LIKE '%STRING%' search which uses the index?
SELECT String, StringReversed FROM Tests WHERE String LIKE '%0A7EB%'
SELECT String, StringReversed FROM Tests
WHERE String LIKE '0A7EB%' OR StringReversed LIKE 'BE7A0%'
Can you help me with this? I am trying to implement a full-text-search alternative to be able to do a LIKE '%STRING%' WHERE clause.
I'm stuck not quite sure if this is even possible to implement? Let's just assume that FULL TEXT SEARCH is not possible, and I need to use index.
This is a prod issue and we need to do a LIKE '%search%' in the string column. I just read here: SQL Server: Index columns used in like? that we can do a reverse?
Hope you can help me, thanks a lot.

Update: I tried the string fragments approach.
-- create table
CREATE TABLE dbo.Tests
(
Id BIGINT NOT NULL IDENTITY,
String NVARCHAR(100),
CONSTRAINT PK_Tests PRIMARY KEY (Id),
)
GO
-- create table for Test String Fragments
CREATE TABLE dbo.TestStringFragments(
Id BIGINT NOT NULL IDENTITY,
TestId BIGINT NOT NULL,
Fragment NVARCHAR(100),
CONSTRAINT PK_TestStringFragments PRIMARY KEY (Id)
)
CREATE NONCLUSTERED INDEX IX_TestStringFragments_Fragment ON dbo.TestStringFragments(Fragment)
CREATE NONCLUSTERED INDEX IX_TestStringFragments_TestId ON dbo.TestStringFragments(TestId)
GO
-- create UDF to generate string fragments
CREATE FUNCTION dbo.CreateStringFragments(#input nvarchar(100))
RETURNS TABLE WITH SCHEMABINDING
AS
RETURN
(
WITH x(x) AS
(SELECT 1 UNION ALL SELECT x+1 FROM x WHERE x < (LEN(#input)))
SELECT Fragment = SUBSTRING(#input, x, LEN(#input)) FROM x
)
GO
-- create trigger for the Tests table
CREATE TRIGGER dbo.Tests_MaintainStringFragments
ON dbo.Tests
FOR INSERT, UPDATE, DELETE
AS
BEGIN
SET NOCOUNT ON
DELETE TSF FROM dbo.TestStringFragments AS TSF
INNER JOIN deleted ON TSF.TestId = deleted.Id
INSERT dbo.TestStringFragments(TestId, Fragment)
SELECT inserted.Id, fragments.Fragment
FROM inserted
CROSS APPLY dbo.CreateStringFragments(inserted.String) AS fragments
END
GO
-- populate table with 100k random strings (for testing)
DECLARE #I INT = 100000
WHILE #I > 0
BEGIN
INSERT INTO Tests(String)
SELECT CONVERT(varchar(36), NEWID())
SET #I = #I - 1
END
Able to replicate the LIKE '%string%' code via the fragments table.
SELECT T.* FROM Tests T WITH(NOLOCK)
WHERE T.String LIKE '%CBB2%'
SELECT T.* FROM Tests T WITH(NOLOCK)
INNER JOIN TestStringFragments TSF WITH(NOLOCK) ON T.Id = TSF.TestId
where TSF.Fragment LIKE 'CBB2%'
My new execution plan is 61% for the first query, 39% for the second one. I'll check the trigrams approach.

Related

If a table has an unindexed column with a 1 to many relationship to an indexed column, how to optimize a query for the unindexed column?

If there is a two column table MyTable with enough records that optimization of queries is relevant.
CorporationID int (unindexed)
BatchID int (indexed)
And lets assume there is always a 1 to many relationship between CorporationID and BatchID. In other words for each BatchID there will be only one CorporationID, but for each CorporationID there will be many BatchID values.
We need to get all BatchID values where corporationID = 1.
I know the simplest solution may be to just add an index to CorporationID, but assuming that is not allowed, is there some other way to inform SQL that each BatchID corresponds to only 1 CorporationID, through a query or otherwise?
select distinct batchid from MyTable where corporationID = 1
It seems this is not effective.
select batchid from (select min(corporationid) corporationid, batchid
from MyTable group by batchid) subselect where corporationid = 1
This is also not effective, I assume due to SQL needing to iterate needlessly through all values of corporationid? (Does an aggregate function exist to select any() value which would not have the overhead of min(), max(), sum() or avg()??)
select batchid
from (
select corporationid, batchid
from (
select *, ROW_NUMBER() OVER (PARTITION BY batchid ORDER BY(SELECT NULL)) AS RowNumber
from mytable
) subselect
where RowNumber = 1
) subselect2
where corporationid = 1
Would this work? By arbitrarily selecting the corporationid related to row number 1 after partitioning by batchid with no order?
"assuming it is not allowed to create an index" - this is a highly unlikely assumption. Of course, you should create the index.
The most direct answer to your alternate questions that lie within your question is "no". There is no function or sub query or view or other "read" action you can make to get a list of the batches for a given CorpID. You NEED to access the corpID data to do that... all your sample queries do not work because, at some point, they NEED to access the CorpIDs to know which rows to gather for BatchIDs. Any summary or "rollup" function that might exist would still NEED to access all the pages of data to "see" them. The reading of the pages cannot be avoided.
Without changes to your architecture, it's not physically possible to optimize your query further.
However, with some changes, you could have some options (but Id guess they are much uglier than just adding the index). For instance, you could modify the structure of your BatchID to include data for both the BatchID and the CorpID. Something like "8888899999999"... the 9's are the batchID and the 8's are the CorpID. This doesn't win you much though, you're not saving any index space, but at least you dont have to index the CorpID field :) Somethings like this could be done, but I wont share any others. I dont want the really experienced people here to see this stuff and get ill. :)
You need an index on CorpID if you want to improve performance.
If you don't have a lot of data, I suggest putting an index on the Corporation ID column. But if you have too much data, you can define an index for each Corporation ID
Part 01=>
/*01Create DB*/
IF DB_ID('Test01')>0
BEGIN
ALTER DATABASE Test01 SET SINGLE_USER WITH ROLLBACK IMMEDIATE
DROP DATABASE Test01
END
GO
CREATE DATABASE Test01
GO
USE Test01
Go
Part 02=>
/*02Create table*/
CREATE TABLE Table01(
ID INT PRIMARY KEY IDENTITY,
Title NVARCHAR(100),
CreationDate DATETIME,
CorporationID INT ,
MyID INT ,
[GuidId1] [uniqueidentifier] NOT NULL,
[GuidId2] [uniqueidentifier] NOT NULL,
[Code] [nvarchar](50) NULL
)
ALTER TABLE [dbo].[Table01] ADD DEFAULT (GETDATE()) FOR [CreationDate]
GO
ALTER TABLE [dbo].[Table01] ADD DEFAULT (NEWSEQUENTIALID()) FOR [GuidId1]
GO
ALTER TABLE [dbo].[Table01] ADD DEFAULT (NEWID()) FOR [GuidId2]
GO
CREATE TABLE Table02(
ID INT PRIMARY KEY IDENTITY,
Title NVARCHAR(100),
CreationDate DATETIME,
CorporationID INT ,
MyID INT ,
[GuidId1] [uniqueidentifier] NOT NULL,
[GuidId2] [uniqueidentifier] NOT NULL,
[Code] [nvarchar](50) NULL
)
ALTER TABLE [dbo].[Table02] ADD DEFAULT (GETDATE()) FOR [CreationDate]
GO
ALTER TABLE [dbo].[Table02] ADD DEFAULT (NEWSEQUENTIALID()) FOR [GuidId1]
GO
ALTER TABLE [dbo].[Table02] ADD DEFAULT (NEWID()) FOR [GuidId2]
GO
Part 03=>
/*03Add Data*/
DECLARE #I INT = 1
WHILE #I < 1000000
BEGIN
DECLARE #Title NVARCHAR(100) = 'TITLE '+ CAST(#I AS NVARCHAR(10)),
#CorporationID INT = CAST((RAND()*20) + 1 AS INT),
#Code NVARCHAR(50) = 'CODE '+ CAST(#I AS NVARCHAR(10)) ,
#MyID INT = CAST((RAND()*50) + 1 AS INT)
INSERT INTO Table01 (Title , CorporationID , Code , MyID )
VALUES ( #Title , #CorporationID , 'CODE '+ #Code , #MyID)
SET #I += 1
END
INSERT INTO Table02 ([Title], [CreationDate], [CorporationID], [MyID], [GuidId1], [GuidId2], [Code])
SELECT [Title], [CreationDate], [CorporationID], [MyID], [GuidId1], [GuidId2], [Code] FROM Table01
Part 04=>
/*04 CREATE INDEX*/
CREATE NONCLUSTERED INDEX IX_Table01_ALL
ON Table01 (CorporationID) INCLUDE (MyID) ;
DECLARE #QUERY NVARCHAR(MAX) = ''
DECLARE #J INT = 1
WHILE #J < 21
BEGIN
SET #QUERY += '
CREATE NONCLUSTERED INDEX IX_Table02_'+CAST(#J AS NVARCHAR(5))+'
ON Table02 (CorporationID) INCLUDE (MyID) WHERE CorporationID = '+CAST(#J AS NVARCHAR(5))+';'
SET #J+= 1
END
EXEC (#QUERY)
Part 05=>
/*05 READ DATA => PUSH Button CTRL + M ( EXECUTION PLAN) */
SET STATISTICS IO ON
SET STATISTICS TIME ON
SELECT * FROM [dbo].[Table01] WHERE CorporationID = 10 AND MyID = 25
SELECT * FROM [dbo].[Table01] WITH(INDEX(IX_Table01_ALL)) WHERE CorporationID = 10 AND MyID = 25
SELECT * FROM [dbo].[Table02] WITH(INDEX(IX_Table02_10)) WHERE CorporationID = 10 AND MyID = 25
SET STATISTICS IO OFF
SET STATISTICS TIME OFF
Notice IO , TIME , and EXECUTION PLAN .
Good luck

Generate ID for duplicate values in sql server

I found following link to assign identical ID to duplicates in SQL server,
my understanding there is no sql server function to automatically generate it rather than using insert and update queries in link attached, is that statement True, if yes, then what would be the trigger if for example someone insert data to MyTable then run insert and update query from link:
Assign identical ID to duplicates in SQL server
INSERT INTO secondTable (word) SELECT distinct word FROM MyTable;
UPDATE MyTable SET ID = (SELECT id from secondTable where MyTable.word = secondTable.word)
thanks,
S
Is this what you want? I can't think of an "automatic" solution that would just increase the Id for new words.
CREATE TABLE MyTable (
Id INT NOT NULL,
Word NVARCHAR(255) NOT NULL
PRIMARY KEY (Id, Word)); -- primary key will make it impossible to have more than one combination of word and id
DECLARE #word NVARCHAR(255) = 'Hello!';
-- Get existing id or calculate a new id
DECLARE #Id INT = (SELECT Id FROM MyTable WHERE Word = #word);
IF(#id IS NULL) SET #Id = (SELECT MAX(Id) + 1 FROM MyTable);
INSERT INTO MyTable (Id, Word)
VALUES (#id, #word)
SELECT * FROM MyTable
If you can't for some reason have id and word as a combined primary key, you may use an unique index to make sure that there is only one combination

Why Optimizer Does Not Use Index Seek on Join

I wonder why the following SELECT statement (below) does not use Index Seek, but Index Scan. Is it just because the number of rows is too small or am I missing something?
Test data:
-- Init Tables
IF OBJECT_ID ( 'tempdb..#wat' ) IS NOT NULL
DROP TABLE #wat;
IF OBJECT_ID ( 'tempdb..#jam' ) IS NOT NULL
DROP TABLE #jam;
CREATE TABLE #wat (
ID INT IDENTITY(1,1) NOT NULL,
Name VARCHAR(15) NOT NULL,
Den DATETIME NOT NULL
)
CREATE TABLE #jam (
ID INT IDENTITY(1,1) NOT NULL,
Name VARCHAR(15) NOT NULL
)
-- Populate Temp Tables with Random Data
DECLARE #length INT
,#charpool VARCHAR(255)
,#poolLength INT
,#RandomString VARCHAR(255)
,#LoopCount INT
SET #Length = RAND() * 5 + 8
SET #CharPool = 'abcdefghijkmnopqrstuvwxyzABCDEFGHIJKLMNPQRSTUVWXYZ23456789'
SET #PoolLength = LEN(#CharPool)
SET #LoopCount = 0
SET #RandomString = ''
WHILE (#LoopCount < 500)
BEGIN
INSERT INTO #jam (Name)
SELECT SUBSTRING(#Charpool, CONVERT(int, RAND() * #PoolLength), 5)
SET #LoopCount = #LoopCount + 1
END
-- Insert Rows into Second Temp Table
INSERT INTO #wat( Name, Den )
SELECT TOP 50 Name, GETDATE()
FROM #jam
-- Create Indexes
--DROP INDEX IX_jedna ON #jam
--DROP INDEX IX_dva ON #wat
CREATE INDEX IX_jedna ON #jam (Name) INCLUDE (ID);
CREATE INDEX IX_dva ON #wat (Name) INCLUDE (ID, Den);
-- Select
SELECT *
FROM #jam j
JOIN #wat w
ON w.Name = j.Name
Execution Plan:
There are several ways for optimiser to do jons: nested loops, hash match or merge join (your case) and may be another.
In dependence of your data: count of rows, existed indexes and statistics id decides which one is better.
in your example optimiser assumes that there is many-to-many relation. And you have both tables soret(indexed) by this fields.
why merge join? - it is logically - to move through both tables parallel. And server will have to do that only once.
To make seek as you want, the server have to move thorugh first table once, and have to make seeks in second table a lot of times, since all records have matches in another table. Server will read all records if he seeks. And there no profit when using seek (1000 seeks even more difucult than one simple loop through 1000 records).
if you want seek add some records with no matches and where clause in your query.
UPD
even adding simple
where j.ID = 1
gives you seek

SQL Server: How can I create a column with the repeating sequence 1-12?

Long story short, I would like to create a column that repeats the pattern 1,2,3,4,5,6,7,8,9,10,11,12,1,2,3,4,...etc. for (12 * 460343 =) 5524116 rows. Any wisdom on how I could complete this? Thank you!
Insert say 48 then select into from self several times. You will get there real fast. It is surprisingly faster than one would think.
If you create a table with an int autoinc column then at end:
delete from table where id>5524116
Edit here you go
create table idFix
( id bigint auto_increment primary key,
num int not null
)engine=myisam;
-- prime it
insert into idFix(num) values (1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12);
-- this is pretty fast, don't laugh
-- run the following line 19 times
insert into idFix(num) select num from idFix;
-- you now have 6.2m rows (6,291,456)
select count(*) from idFix
delete from idFix where id>5524116;
select count(*) from idFix;
select min(num),max(num) from idFix;
Takes 3 minutes max
Use your helper table then for the love of Pete drop it !
Use a loop and do some mod division with your counter.
DECLARE #LoopCounter bigint
SET #LoopCounter = 0
CREATE TABLE #YourValues
(
YourValue_Key int NOT NULL identity (1,1) PRIMARY KEY,
YourValue_OneThrough12Repating int
)
WHILE #LoopCounter < 5524116
BEGIN
INSERT INTO #YourValues (YourValue_OneThrough12Repating) VALUES ((#LoopCounter % 12) + 1)
SET #LoopCounter = #LoopCounter + 1
END
SELECT * FROM #YourValues
DROP TABLE #YourValues

insert data into several tables

Let us say I have a table (everything is very much simplified):
create table OriginalData (
ItemName NVARCHAR(255) not null
)
And I would like to insert its data (set based!) into two tables which model inheritance
create table Statements (
Id int IDENTITY NOT NULL,
ProposalDateTime DATETIME null
)
create table Items (
StatementFk INT not null,
ItemName NVARCHAR(255) null,
primary key (StatementFk)
)
Statements is the parent table and Items is the child table. I have no problem doing this with one row which involves the use of IDENT_CURRENT but I have no idea how to do this set based (i.e. enter several rows into both tables).
Thanks.
Best wishes,
Christian
Another possible method that would prevent the use of cursors, which is generally not a best practice for SQL, is listed below... It uses the OUTPUT clause to capture the insert results from the one table to be used in the insert to the second table.
Note this example makes one assumption in the fact that I moved your IDENTITY column to the Items table. I believe that would be acceptable, atleast based on your original table layout, since the primary key of that table is the StatementFK column.
Note this example code was tested via SQL 2005...
IF OBJECT_ID('tempdb..#OriginalData') IS NOT NULL
DROP TABLE #OriginalData
IF OBJECT_ID('tempdb..#Statements') IS NOT NULL
DROP TABLE #Statements
IF OBJECT_ID('tempdb..#Items') IS NOT NULL
DROP TABLE #Items
create table #OriginalData
( ItemName NVARCHAR(255) not null )
create table #Statements
( Id int NOT NULL,
ProposalDateTime DATETIME null )
create table #Items
( StatementFk INT IDENTITY not null,
ItemName NVARCHAR(255) null,
primary key (StatementFk) )
INSERT INTO #OriginalData
( ItemName )
SELECT 'Shirt'
UNION ALL SELECT 'Pants'
UNION ALL SELECT 'Socks'
UNION ALL SELECT 'Shoes'
UNION ALL SELECT 'Hat'
DECLARE #myTableVar table
( StatementFk int,
ItemName nvarchar(255) )
INSERT INTO #Items
( ItemName )
OUTPUT INSERTED.StatementFk, INSERTED.ItemName
INTO #myTableVar
SELECT ItemName
FROM #OriginalData
INSERT INTO #Statements
( ID, ProposalDateTime )
SELECT
StatementFK, getdate()
FROM #myTableVar
You will need to write an ETL process to do this. You may want to look into SSIS.
This also can be done with t-sql and possibly temp tables. You may need to store unique key from OriginalTable in Statements table and then when you are inserting Items - join OriginalTable with Statements on that unique key to get the ID.
I don't think you could do it in one chunk but you could certainly do it with a cursor loop
DECLARE #bla char(10)
DECLARE #ID int
DECLARE c1 CURSOR
FOR
SELECT bla
FROM OriginalData
OPEN c1
FETCH NEXT FROM c1
INTO #bla
WHILE ##FETCH_STATUS = 0
BEGIN
INSERT INTO Statements(ProposalDateTime) VALUES('SomeDate')
SET #ID = SCOPE_IDENTITY()
INSERT INTO Items(StateMentFK,ItemNAme) VALUES(#ID,#bla)
FETCH NEXT FROM c1
INTO #bla
END
CLOSE c1
DEALLOCATE c1