optimal solution for lookup on large sql server table

optimal solution for lookup on large sql server table - vb.net

I have a large table of user ids and another table of user records which contains a user post with user ids. The process is whenever a new feed post is retrieved,I do a request to the user id table for an id that is marked inactive ( I have that field ACTIVE because I have another process that creates these ids and inserts it continuously into table 1) and when an id is requested it is marked as inactive.
Then I check if the user exists in the user table(table 2) and if so return the user id associated with that user.
I was told that I can speed up this process but creating a hash table to do the lookup on table 2. I am not sure how to even start this and any links or samples will be appreciated.
Also I need to run a separate process that cleans table 1 and removes all inactive user ids.
When I call the procedure to insert into table 2, I pass the user id retrieved from table 1.
CREATE TABLE [dbo].[userforums]
(
[userid] [VARCHAR](16) NOT NULL CONSTRAINT [PK_forumssiteid] PRIMARY KEY CLUSTERED ,
[forumname] [VARCHAR](500) NOT NULL,
[exported] [INT] NULL,
[lastcrawled] [DATETIME] NULL,
[priority] [INT] NULL,
[origin] [VARCHAR](50) NULL,
[queryid] [VARCHAR](25) NULL,
[dateinserted] [DATETIME] NULL DEFAULT (getdate())
)
second table
CREATE TABLE [dbo].[userids]
(
[userid] [NVARCHAR](20) NOT NULL CONSTRAINT [PK_userids] PRIMARY KEY CLUSTERED,
[active] [NVARCHAR](20) NULL CONSTRAINT [IX_userids] UNIQUE NONCLUSTERED
)
get user id stored procedure
BEGIN TRANSACTION
SELECT TOP 1 #id = userid
FROM userids WITH (UPDLOCK, HOLDLOCK)
WHERE active = 'Y'
OR active IS NULL
UPDATE userids
SET active = 'N'
WHERE userid = #id
COMMIT TRANSACTION
check if userid exists
CREATE PROC Foo #forumname VARCHAR(500),
#userid VARCHAR(16),
#origin VARCHAR(50),
#queryid VARCHAR(25)
AS
SET NOCOUNT ON;
DECLARE #cnt INT
DECLARE #serverip VARCHAR(16)
DECLARE #mincnt INT
DECLARE #siteservercnt INT
SELECT #cnt = COUNT(*)
FROM userforums
WHERE forumname = #forumname
IF #cnt = 0
BEGIN
INSERT INTO userforums
(forumname,
userid,
exported,
origin,
queryid)
VALUES (#forumname,
#userid,
1,
#origin,
#queryid)
SELECT #siteservercnt = COUNT(*)
FROM siteserverip
WHERE userid = #userid
IF #siteservercnt = 0
BEGIN
SELECT TOP 1 #mincnt = COUNT(*),
#serverip = serverip
FROM siteserverip
GROUP BY serverip
ORDER BY COUNT(*)
SELECT TOP 1 #mincnt = sitecount,
#serverip = serverip
FROM serveripcounts
ORDER BY sitecount
INSERT INTO siteserverip
VALUES (#siteid,
#serverip)
UPDATE serveripcounts
SET sitecount = sitecount + 1
WHERE serverip = #serverip
END
END
SELECT userid
FROM userforums
WHERE forumname = #forumname
RETURN

Your existing dequeue query can be improved. Instead of
DECLARE #id INT
SELECT TOP 1 #id = userid
FROM userids WITH (UPDLOCK, HOLDLOCK)
WHERE active = 'Y'
OR active IS NULL
UPDATE userids
SET active = 'N'
WHERE userid = #id
Which is two operations (a clustered index scan followed by an index seek) you can do
UPDATE TOP (1) userids
WITH (ROWLOCK, READPAST)
SET active = 'N'
OUTPUT INSERTED.userid
WHERE active <> 'N'
Which is one operation and gives a plan with two range seeks.

A Hash table #TableName is a temporary object in tempdb that functions as a table. They are generally called 'temp tables'. I would NOT be using them as a first solution for retrieving data on the fly if this is a common occurrence. Instead I would create an index and see if that justifies your needs. Generally hash tables are used for intense operations where you want to get a set of things that may or may not be indexed and then relate it to something else and you want to keep it in memory.
I would create an index and that should improve speed. Also if you find is slow, a hash table won't speed that part up, it will just be putting a collection of that into a source to reuse seperated from the main table.
create index IX_[yourtableName]_[TableColumn(s)] on [Tablename]([Column(s)]
I would not create more objects unless necessary. Generally if your UserId's are valid ints you can search on them quite fast.

Related

If a table has an unindexed column with a 1 to many relationship to an indexed column, how to optimize a query for the unindexed column?

If there is a two column table MyTable with enough records that optimization of queries is relevant.
CorporationID int (unindexed)
BatchID int (indexed)
And lets assume there is always a 1 to many relationship between CorporationID and BatchID. In other words for each BatchID there will be only one CorporationID, but for each CorporationID there will be many BatchID values.
We need to get all BatchID values where corporationID = 1.
I know the simplest solution may be to just add an index to CorporationID, but assuming that is not allowed, is there some other way to inform SQL that each BatchID corresponds to only 1 CorporationID, through a query or otherwise?
select distinct batchid from MyTable where corporationID = 1
It seems this is not effective.
select batchid from (select min(corporationid) corporationid, batchid
from MyTable group by batchid) subselect where corporationid = 1
This is also not effective, I assume due to SQL needing to iterate needlessly through all values of corporationid? (Does an aggregate function exist to select any() value which would not have the overhead of min(), max(), sum() or avg()??)
select batchid
from (
select corporationid, batchid
from (
select *, ROW_NUMBER() OVER (PARTITION BY batchid ORDER BY(SELECT NULL)) AS RowNumber
from mytable
) subselect
where RowNumber = 1
) subselect2
where corporationid = 1
Would this work? By arbitrarily selecting the corporationid related to row number 1 after partitioning by batchid with no order?

"assuming it is not allowed to create an index" - this is a highly unlikely assumption. Of course, you should create the index.
The most direct answer to your alternate questions that lie within your question is "no". There is no function or sub query or view or other "read" action you can make to get a list of the batches for a given CorpID. You NEED to access the corpID data to do that... all your sample queries do not work because, at some point, they NEED to access the CorpIDs to know which rows to gather for BatchIDs. Any summary or "rollup" function that might exist would still NEED to access all the pages of data to "see" them. The reading of the pages cannot be avoided.
Without changes to your architecture, it's not physically possible to optimize your query further.
However, with some changes, you could have some options (but Id guess they are much uglier than just adding the index). For instance, you could modify the structure of your BatchID to include data for both the BatchID and the CorpID. Something like "8888899999999"... the 9's are the batchID and the 8's are the CorpID. This doesn't win you much though, you're not saving any index space, but at least you dont have to index the CorpID field :) Somethings like this could be done, but I wont share any others. I dont want the really experienced people here to see this stuff and get ill. :)
You need an index on CorpID if you want to improve performance.

If you don't have a lot of data, I suggest putting an index on the Corporation ID column. But if you have too much data, you can define an index for each Corporation ID

Part 01=>
/*01Create DB*/
IF DB_ID('Test01')>0
BEGIN
ALTER DATABASE Test01 SET SINGLE_USER WITH ROLLBACK IMMEDIATE
DROP DATABASE Test01
END
GO
CREATE DATABASE Test01
GO
USE Test01
Go
Part 02=>
/*02Create table*/
CREATE TABLE Table01(
ID INT PRIMARY KEY IDENTITY,
Title NVARCHAR(100),
CreationDate DATETIME,
CorporationID INT ,
MyID INT ,
[GuidId1] [uniqueidentifier] NOT NULL,
[GuidId2] [uniqueidentifier] NOT NULL,
[Code] [nvarchar](50) NULL
)
ALTER TABLE [dbo].[Table01] ADD DEFAULT (GETDATE()) FOR [CreationDate]
GO
ALTER TABLE [dbo].[Table01] ADD DEFAULT (NEWSEQUENTIALID()) FOR [GuidId1]
GO
ALTER TABLE [dbo].[Table01] ADD DEFAULT (NEWID()) FOR [GuidId2]
GO
CREATE TABLE Table02(
ID INT PRIMARY KEY IDENTITY,
Title NVARCHAR(100),
CreationDate DATETIME,
CorporationID INT ,
MyID INT ,
[GuidId1] [uniqueidentifier] NOT NULL,
[GuidId2] [uniqueidentifier] NOT NULL,
[Code] [nvarchar](50) NULL
)
ALTER TABLE [dbo].[Table02] ADD DEFAULT (GETDATE()) FOR [CreationDate]
GO
ALTER TABLE [dbo].[Table02] ADD DEFAULT (NEWSEQUENTIALID()) FOR [GuidId1]
GO
ALTER TABLE [dbo].[Table02] ADD DEFAULT (NEWID()) FOR [GuidId2]
GO
Part 03=>
/*03Add Data*/
DECLARE #I INT = 1
WHILE #I < 1000000
BEGIN
DECLARE #Title NVARCHAR(100) = 'TITLE '+ CAST(#I AS NVARCHAR(10)),
#CorporationID INT = CAST((RAND()*20) + 1 AS INT),
#Code NVARCHAR(50) = 'CODE '+ CAST(#I AS NVARCHAR(10)) ,
#MyID INT = CAST((RAND()*50) + 1 AS INT)
INSERT INTO Table01 (Title , CorporationID , Code , MyID )
VALUES ( #Title , #CorporationID , 'CODE '+ #Code , #MyID)
SET #I += 1
END
INSERT INTO Table02 ([Title], [CreationDate], [CorporationID], [MyID], [GuidId1], [GuidId2], [Code])
SELECT [Title], [CreationDate], [CorporationID], [MyID], [GuidId1], [GuidId2], [Code] FROM Table01
Part 04=>
/*04 CREATE INDEX*/
CREATE NONCLUSTERED INDEX IX_Table01_ALL
ON Table01 (CorporationID) INCLUDE (MyID) ;
DECLARE #QUERY NVARCHAR(MAX) = ''
DECLARE #J INT = 1
WHILE #J < 21
BEGIN
SET #QUERY += '
CREATE NONCLUSTERED INDEX IX_Table02_'+CAST(#J AS NVARCHAR(5))+'
ON Table02 (CorporationID) INCLUDE (MyID) WHERE CorporationID = '+CAST(#J AS NVARCHAR(5))+';'
SET #J+= 1
END
EXEC (#QUERY)
Part 05=>
/*05 READ DATA => PUSH Button CTRL + M ( EXECUTION PLAN) */
SET STATISTICS IO ON
SET STATISTICS TIME ON
SELECT * FROM [dbo].[Table01] WHERE CorporationID = 10 AND MyID = 25
SELECT * FROM [dbo].[Table01] WITH(INDEX(IX_Table01_ALL)) WHERE CorporationID = 10 AND MyID = 25
SELECT * FROM [dbo].[Table02] WITH(INDEX(IX_Table02_10)) WHERE CorporationID = 10 AND MyID = 25
SET STATISTICS IO OFF
SET STATISTICS TIME OFF
Notice IO , TIME , and EXECUTION PLAN .
Good luck

Why Optimizer Does Not Use Index Seek on Join

I wonder why the following SELECT statement (below) does not use Index Seek, but Index Scan. Is it just because the number of rows is too small or am I missing something?
Test data:
-- Init Tables
IF OBJECT_ID ( 'tempdb..#wat' ) IS NOT NULL
DROP TABLE #wat;
IF OBJECT_ID ( 'tempdb..#jam' ) IS NOT NULL
DROP TABLE #jam;
CREATE TABLE #wat (
ID INT IDENTITY(1,1) NOT NULL,
Name VARCHAR(15) NOT NULL,
Den DATETIME NOT NULL
)
CREATE TABLE #jam (
ID INT IDENTITY(1,1) NOT NULL,
Name VARCHAR(15) NOT NULL
)
-- Populate Temp Tables with Random Data
DECLARE #length INT
,#charpool VARCHAR(255)
,#poolLength INT
,#RandomString VARCHAR(255)
,#LoopCount INT
SET #Length = RAND() * 5 + 8
SET #CharPool = 'abcdefghijkmnopqrstuvwxyzABCDEFGHIJKLMNPQRSTUVWXYZ23456789'
SET #PoolLength = LEN(#CharPool)
SET #LoopCount = 0
SET #RandomString = ''
WHILE (#LoopCount < 500)
BEGIN
INSERT INTO #jam (Name)
SELECT SUBSTRING(#Charpool, CONVERT(int, RAND() * #PoolLength), 5)
SET #LoopCount = #LoopCount + 1
END
-- Insert Rows into Second Temp Table
INSERT INTO #wat( Name, Den )
SELECT TOP 50 Name, GETDATE()
FROM #jam
-- Create Indexes
--DROP INDEX IX_jedna ON #jam
--DROP INDEX IX_dva ON #wat
CREATE INDEX IX_jedna ON #jam (Name) INCLUDE (ID);
CREATE INDEX IX_dva ON #wat (Name) INCLUDE (ID, Den);
-- Select
SELECT *
FROM #jam j
JOIN #wat w
ON w.Name = j.Name
Execution Plan:

There are several ways for optimiser to do jons: nested loops, hash match or merge join (your case) and may be another.
In dependence of your data: count of rows, existed indexes and statistics id decides which one is better.
in your example optimiser assumes that there is many-to-many relation. And you have both tables soret(indexed) by this fields.
why merge join? - it is logically - to move through both tables parallel. And server will have to do that only once.
To make seek as you want, the server have to move thorugh first table once, and have to make seeks in second table a lot of times, since all records have matches in another table. Server will read all records if he seeks. And there no profit when using seek (1000 seeks even more difucult than one simple loop through 1000 records).
if you want seek add some records with no matches and where clause in your query.
UPD
even adding simple
where j.ID = 1
gives you seek

Multi Parent-Child Insertion

I'm trying to write a stored procedure to pull information from an XML string and use it to create multiple parent-child relationships. I am trying to push this XML into actual database tables. Basically, the local client will send an XML file to the database and store it as a string. I then need to pull the information out of that string and update the appropriate tables. If this was just a Table-A to Table-B, this wouldn't be so difficult. The problem I'm running into is it need to go from Table-A to Table-B to Table-C to Table-D where applicable. Below is a sample XML:
<RunRecordFile>
<Competition>
<Name>Daily</Name>
<StartDate>11/9/2015 12:40:07 AM</StartDate>
<Runs>
<Id>123</Id>
<Name>Daily Run</Name>
<RunDate>11/9/2015 12:40:07 AM</RunDate>
<CompetitionId>1</CompetitionId>
<RunRecords>
<Id>001</Id>
<Number>007</Number>
<ElapsedTime>23.007</ElapsedTime>
<RunId>123</RunId>
</RunRecords>
</Runs>
<Runs>
<Id>456</Id>
<Name>Daily Run</Name>
<RunDate>11/9/2015 12:47:07 AM</RunDate>
<CompetitionId>1</CompetitionId>
<RunRecords>
<Id>002</Id>
<Number>700</Number>
<ElapsedTime>23.707</ElapsedTime>
<RunId>456</RunId>
<RunRecordSpecialty>
<Id>1</Id>
<Handicap>17</Handicap>
<TeamPoints>50000</TeamPoints>
<RunRecordId>002</RunRecordId>
</RunRecordSpecialty>
</RunRecords>
</Runs>
</Competition>
</RunRecordFile>
I've attempted to use a DECLARED table to hold each of the created Primary Keys and to use SQL OUTPUT in order to gather those. When I run my SQL I'm getting (0) Rows Updated. Here's what I've tried in SQL:
CREATE PROC [dbo].[RaceFilePush]
AS
DECLARE #CompetitionIdMapping TABLE ( CompetitionId bigint )
DECLARE #RunIdMapping TABLE ( RunId bigint )
DECLARE #RunRecordIdMapping TABLE ( RunRecordId bigint )
BEGIN
DECLARE #rrXML AS XML
DECLARE #rrfId AS BIGINT
SET #rrfId = (SELECT TOP 1 Id FROM RunRecordFile WHERE Submitted IS NULL)
SET #rrXML = (SELECT TOP 1 RaceFile FROM RunRecordFile WHERE Id = #rrfId)
BEGIN TRAN Competitions
BEGIN TRY
INSERT INTO Competition (
Name
,StartDate
)
OUTPUT INSERTED.Id INTO #CompetitionIdMapping(CompetitionId)
SELECT
xCompetition.value('(Name)[1]', 'varchar(225)') AS Name
,xCompetition.value('(StartDate)[1]', 'datetime') AS StartDate
,#rrfId AS RunRecordFileId
FROM
#rrXML.nodes('/RunRecordFile/Competition') AS E(xCompetition)
INSERT INTO Run (
Name
,RunDate
,CompetitionId
)
OUTPUT INSERTED.Id INTO #RunIdMapping(RunId)
SELECT
xRuns.value('(Name)[1]','varchar(80)') AS Name
,xRuns.value('(RunDate)[1]','datetime') AS RunDate
,(SELECT CompetitionId FROM #CompetitionIdMapping)
FROM
#rrXML.nodes('/RunRecordFile/Competition/Runs') AS E(xRuns)
INSERT INTO RunRecord (
Number
,ElapsedTime
,RunId
)
OUTPUT INSERTED.Id INTO #RunRecordIdMapping(RunRecordId)
SELECT
xRunRecords.value('(Number)[1]','varchar(10)') AS Number
,xRunRecords.value('(ElapsedTime)[1]','numeric(10,5)') AS ElapsedTime
,(SELECT RunId FROM #RunIdMapping)
FROM
#rrXML.nodes('/RunRecordFile/Competition/Runs/RunRecords') AS E(xRunRecords)
INSERT INTO RunRecordSpecialty (
Handicap
,TeamPoints
,RunRecordId
)
SELECT
xRunRecordSpecialty.value('(Handicap)[1]','numeric(10,5)') AS Handicap
,xRunRecordSpecialty.value('(TeamPoints)[1]','numeric(10,5)') AS TeamPoints
,(SELECT RunRecordId FROM #RunRecordIdMapping)
FROM
#rrXML.nodes('/RunRecordFile/Competition/Runs/RunRecordSpecialty') AS E(xRunRecordSpecialty)
UPDATE RunRecordFile SET Submitted = GETDATE() WHERE Id = #rrfId
COMMIT TRAN Competitions
END TRY
BEGIN CATCH
ROLLBACK TRAN Competitions
END CATCH
END

With this SQL you get the whole thing into a flat declared table #tbl:
Remark: I placed the XML from your question into a variable called #xml. Adapt this to your needs...
DECLARE #tbl TABLE (
[Competition_Name] [varchar](max) NULL,
[Competition_StartDate] [datetime] NULL,
[Run_Id] [int] NULL,
[Run_Name] [varchar](max) NULL,
[Run_RunDate] [datetime] NULL,
[Run_CompetitionId] [int] NULL,
[RunRecords_Id] [int] NULL,
[RunRecords_Number] [int] NULL,
[RunRecords_ElapsedTime] [float] NULL,
[RunRecords_RunId] [int] NULL,
[RunRecordSpecialty_Id] [int] NULL,
[RunRecordSpecialty_Handicap] [int] NULL,
[RunRecordSpecialty_TeamPoints] [int] NULL,
[RunRecordSpecialty_RunRecordId] [int] NULL
);
INSERT INTO #tbl
SELECT Competition.value('Name[1]','varchar(max)') AS Competition_Name
,Competition.value('StartDate[1]','datetime') AS Competition_StartDate
,Run.value('Id[1]','int') AS Run_Id
,Run.value('Name[1]','varchar(max)') AS Run_Name
,Run.value('RunDate[1]','datetime') AS Run_RunDate
,Run.value('CompetitionId[1]','int') AS Run_CompetitionId
,RunRecords.value('Id[1]','int') AS RunRecords_Id
,RunRecords.value('Number[1]','int') AS RunRecords_Number
,RunRecords.value('ElapsedTime[1]','float') AS RunRecords_ElapsedTime
,RunRecords.value('RunId[1]','int') AS RunRecords_RunId
,RunRecordSpecialty.value('Id[1]','int') AS RunRecordSpecialty_Id
,RunRecordSpecialty.value('Handicap[1]','int') AS RunRecordSpecialty_Handicap
,RunRecordSpecialty.value('TeamPoints[1]','int') AS RunRecordSpecialty_TeamPoints
,RunRecordSpecialty.value('RunRecordId[1]','int') AS RunRecordSpecialty_RunRecordId
FROM #xml.nodes('/RunRecordFile/Competition') AS A(Competition)
OUTER APPLY Competition.nodes('Runs') AS B(Run)
OUTER APPLY Run.nodes('RunRecords') AS C(RunRecords)
OUTER APPLY RunRecords.nodes('RunRecordSpecialty') AS D(RunRecordSpecialty)
;
SELECT * FROM #tbl
If you need generated IDs, you just add the columns to #tbl and write them there, either "on the flow" or afterwards wiht UPDATE statement.
It should be easy to work through this flat table, select just the needed data level with DISTINCT and insert the rows, then the next level and so on...
Good luck!

Check CONSTRAINT Not work in SQL Server

I have to restrict to insert duplicate data in my table with condition
Here is SQL Server Table
CREATE TABLE [dbo].[temptbl](
[id] [numeric](18, 0) IDENTITY(1,1) NOT NULL,
[DSGN] [varchar](500) NULL,
[RecordType] [varchar](1000) NULL
)
I want to put condition on RecordType, if RecordType is 'SA' Than check CONSTRAINT (means if DSGN = 0 and RecordType = 'SA' Exist than i don't want to insert that data.
if DSGN = 1 and RecordType = 'SA' Not Exist than i want to insert that data.
If RecordType is other than 'SA' than insert any data
For that i create constraint but it is not work
ALTER TABLE temptbl WITH CHECK ADD CONSTRAINT chk_Stuff CHECK (([dbo].[chk_Ints]([DSGN],[RecordType])=(0)))
GO
ALTER FUNCTION [dbo].[chk_Ints](#Int_1 int,#Int_2 varchar(20))
RETURNS int
AS
BEGIN
DECLARE #Result INT
BEGIN
IF #Int_2 = 'SA'
BEGIN
IF NOT EXISTS (SELECT * FROM [temptbl] WHERE DSGN = #Int_1 AND RecordType = #Int_2)
BEGIN
SET #Result = 0
END
ELSE
BEGIN
SET #Result = 1
END
END
ELSE
BEGIN
SET #Result = 0
END
END
RETURN #Result
END
But it is not working. please suggest me

Ditch the function and the check constraint:
CREATE UNIQUE INDEX IX_temptbl_SA ON temptbl (DSGN) WHERE RecordType='SA'
This is known as a filtered index.
Your check constraint wasn't working as you thought it would because when a check constraint is evaluated for any particular row, that row is already visible within the table (within the context of that transaction) and so each row was effectively blocking its own insertion.

Use With NOCheck for only new data. Here is the link provided.
https://technet.microsoft.com/en-us/library/ms179491(v=sql.105).aspx

How to insert a record and make sure the entire row is unique

I want to insert multiple values into a row, but I want to make sure that the row is unique, i.e. no duplicate rows.
I am unsure how to do this (it is fairly easy to if there is only a single value to check for i.e. like this: SQL Server - How to insert a record and make sure it is unique).
This is my code, but it won't allow me to insert unique rows as it tests for single columns and multiple columns combined.
CREATE TABLE myCities (
UserID int null,
CityID int null
)
DECLARE #UserID int, #CityID int
SET #UserID = 1
SET #CityID = 1
INSERT INTO myCities (UserID,CityID)
SELECT #UserID,#CityID
WHERE
#UserID NOT IN ( SELECT UserID FROM myCities WHERE UserID = #UserID )
AND
#CityID NOT IN ( SELECT CityID FROM myCities WHERE CityID = #CityID )

The only sure way is to put the check in the database. In this case create a unique key on the table which will also be its primary key so
-- syntax for MS/Sybase at least is
ALTER TABLE myCities
ADD CONSTRAINT uc_myCities UNIQUE (UserID,CityID)
Then when you insert a duplicate then you will get an error and your code will have to deal with it.

Sometimes the obvious is right at hand - solved it by using NOT EXISTS, like this:
INSERT INTO myCities (UserID,CityID)
SELECT #UserID,#CityID
WHERE NOT EXISTS (
SELECT UserID FROM myCities
WHERE
UserID = #UserID and
CityID = #CityID
)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

optimal solution for lookup on large sql server table - vb.net

Related

If a table has an unindexed column with a 1 to many relationship to an indexed column, how to optimize a query for the unindexed column?

Why Optimizer Does Not Use Index Seek on Join

Multi Parent-Child Insertion

Check CONSTRAINT Not work in SQL Server

How to insert a record and make sure the entire row is unique

Categories

Resources