Subquery Performance - Non Unique Column in Where Clause

Subquery Performance - Non Unique Column in Where Clause - sql

I have two tables
Table Jobs
[ID] [int] IDENTITY(1,1) NOT NULL,
[title] [varchar](150) NULL,
[description] [text] NULL
Table JobSkills
[id] [int] IDENTITY(1,1) NOT NULL,
[jobId] [int] NULL,
[skill] [varchar](150) NULL
Shown above partial list of columns.
For table JobSkills I have indexed jobId column, column skill is full text indexed.
I have a stored procedure to get the list of jobs. sort of like this.
Select totalItems
,Id,title
from
(
Select Row_Number() over(Order By
CASE WHEN #sortBy Is Not Null AND #sortBy='relevance'
THEN
SkillMatchRank
END DESC
,CASE WHEN #sortBy Is Not Null AND #sortBy='date' THEN CreateDate END DESC
) As rowNumber
,COUNT(*) OVER() as totalItems
,ID,createDate,title
from Jobs J
OUTER APPLY dbo.GetJobSkillMatchRank(J.ID,#searchKey) As SkillMatchRank
Where
--where conditions here
) tempData
where
rowNumber>=CASE WHEN #startIndex>0 AND #endIndex>0 THEN #startIndex ELSE rowNumber END
AND rowNumber<=CASE WHEN #startIndex>0 AND #endIndex>0 THEN #endIndex ELSE rowNumber END
I have created a inline table valued function to get the skill matching rank.
CREATE FUNCTION [dbo].[GetJobSkillMatchRank]
(
#jobId int,
#searchKey varchar(150)
)
RETURNS TABLE
AS
RETURN
(
select SUM(ISNULL(JS2.[Rank],0)) as rank
from FREETEXTTABLE(JobSkills,skill,#searchKey) JS2
Where JS2.[Key] in (Select ID from JobSkills Where jobId=#jobId)
)
GO
Problem
Query runs super slow, more then a minute.
My observation
For the table valued function if I set jobId=1 (I do have a job with id=1) then it performs super fast as desired.
I understand that jobId is not unique column on JobSkills table.
In this case how could I improve the performance???

UDFs are great in certain cases, but the execution plans don't get cached like sprocs do. If you try using another derived table instead of a function, the query might perform better.

Related

Searching 13 million records using full text search with additional conditions

Performance issue while doing SQL Server full text search with additional conditions. (SQL Server 2012)
I am trying to filter the data based on search filters list (table value parameter), which will return all the records for match filters and single record for the filter doesn't have any record from tables.
Full text search index is already on table Names for column SNAME.
In stored procedure, table type parameter SearchFilter is used to pass list of name and address info.
Both tables have more than 14 million records, when we execute the procedure with 1000 unique records passed in filters list it took around 7 minutes to return the result (1400 records).
Filter criteria is: contains(name) and streetaddress, city, state, zip exact match.
Is there any alternate to avoid while loop as SQL Server CONTAINS function required string value or variable?
CREATE TABLE [dbo].[Names]
(
[ID] [int] IDENTITY(1,1) NOT NULL,
[UIN] [varchar](9) NULL,
[SNAME] [varchar](500) NULL,
CONSTRAINT [PK_Names]
PRIMARY KEY CLUSTERED ([ID] ASC)
)
CREATE TABLE [dbo].[ADDRESSES]
(
[UIN] [varchar](9) NULL,
[STREET1] [varchar](100) NULL,
[STREET2] [varchar](50) NULL,
[CITY] [varchar](30) NULL,
[STATE] [varchar](2) NULL,
[ZIP] [varchar](10) NULL
) ON [PRIMARY]
CREATE TYPE [dbo].[SearchFilter] AS TABLE
(
[UIN] [varchar](40) NULL,
[SNAME] [varchar](max) NULL,
[StreetAddress] [varchar](max) NULL,
[City] [varchar](max) NULL,
[State] [varchar](50) NULL,
[Zip] [varchar](20) NULL
)
-- Stored procedure logic
DECLARE #filterList AS [dbo].[SearchFilter]
DECLARE #NoOfRows INT, #counter INT = 0
SET #NoOfRows = (SELECT COUNT(1) FROM #filterList)
DECLARE #result TABLE (UIN varchar(40),
NAME varchar(500),
StreetAddress varchar(1000),
Zipcode varchar(20),
State varchar(20),
City varchar(1000),
IsRecordFound varchar(50)
);
WHILE (#NoOfRows > #counter)
BEGIN
DECLARE #SearchName VARCHAR(4000)
SET #SearchName = (SELECT '"'+SNAME+'"' FROM #filterList ORDER BY SNAME OFFSET #counter ROWS FETCH NEXT 1 ROWS ONLY)
--Start: Process to Select Records
;WITH Filter_CTE AS
(
SELECT
SNAME, StreetAddress, City, State, ZipCode
FROM
#filterList
ORDER BY
SNAME
OFFSET #counter ROWS FETCH NEXT 1 ROWS ONLY
)
INSERT INTO #result (UIN, NAME, STREETADDRESS, CITY, STATE, ZIPCODE, PHONE, IsRecordFound)
SELECT DISTINCT
en.UIN, ISNULL(en.SNAME, Filter_CTE.SNAME),
Filter_CTE.StreetAddress, Filter_CTE.ZipCode,
Filter_CTE.state, Filter_CTE.City,
IIF(en.UIN IS NULL, 'Not Found', 'Found') AS IsRecordFound
FROM
dbo.Names en
INNER JOIN
dbo.ADDRESSES ea ON en.UIN = ea.UIN
RIGHT JOIN
Filter_CTE ON ea.ZIP = Filter_CTE.Zip
AND ea.STATE = Filter_CTE.State
AND ea.CITY = Filter_CTE.City
AND (ISNULL(ea.STREET1, '') + ' ' + ISNULL(ea.STREET2, '')) = Filter_CTE.StreetAddress
AND CONTAINS(en.SNAME,#SearchName)
--END
SET #counter += 1
END
SELECT
UIN, NAME, STREETADDRESS, CITY, STATE, ZIPCODE, PHONE
FROM
#result

Currently it is not possible to use column names as search condition in CONTAINS or CONTAINSTABLE. So, you cannot do direct JOIN between data table and the SearchFilter table with FTS predicates applied.
The current solution found in other questions/forums is to loop through the filters list and feed CONTAINS with search condition in a variable, just as you do. So, you won't get rid of this loop.
However, looking at your query I see a number of other problems which may affect performance:
DISTINCT clause in INSERT INTO #result ... SELECT DISTINCT .... It's on the level where you JOIN to tables with millions of records. Though I understand that final result may contain only a few thousands of rows, it's better to move DISTINCT to this line:
SELECT DISTINCT
UIN, NAME, STREETADDRESS, CITY, STATE, ZIPCODE, PHONE
FROM
#result
This condition AND (ISNULL(ea.STREET1, '') + ' ' + ISNULL(ea.STREET2, '')) = Filter_CTE.StreetAddress is certainly NOT SARGable. You use concatenation and function (ISNULL()) which prevents SQL Server from using existing indexes over dbo.ADDRESSES ea table. Check this question: What makes a SQL statement sargable? to see how to construct JOIN / WHERE conditions in such a way that will allow the use of indexes.
In this particular case it's better to add a computed column to the dbo.Addresses table and then build an index over it (or add it to the existing index):
CREATE TABLE [dbo].[ADDRESSES]
(
...
STREET as (ISNULL(ea.STREET1, '') + ' ' + ISNULL(ea.STREET2, '')),
...
)
So fix the above 1. and 2. then comment the AND CONTAINS(en.SNAME,#SearchName) condition in RIGHT JOIN and notice execution time. Afterwards, uncomment the CONTAINS condition and see how much delay was added. This way you will know for sure if it's FTS engine to blame for the delay or your main query itself needs improvements.
To be able to advise more, we need to see the execution plans for your procedure. You can share your query execution plan using this page: https://www.brentozar.com/pastetheplan/ .
HTH

Ambiguous column name... but only sometimes?

So apparently this has Ambiguous column name 'LocationID':
DECLARE #temptable table (LocationID int)
INSERT INTO #temptable SELECT LocationID FROM inserted;
INSERT INTO dbo.LocationsPlants (LocationID, PlantID)
(SELECT LocationID, PlantID FROM dbo.Plants CROSS JOIN #temptable)
Which I can fix by altering the bottom line to:
(SELECT T.LocationID, PlantID FROM dbo.Plants CROSS JOIN #temptable AS T)
But this identical query on another table does NOT have ambiguous column IncotermsID:
DECLARE #temptable table (IncotermsID int)
INSERT INTO #temptable SELECT IncotermsID FROM inserted;
INSERT INTO dbo.IncotermsPlants (IncotermsID, PlantID)
(SELECT IncotermsID,PlantID FROM dbo.Plants CROSS JOIN #temptable)
I'm puzzled. Table structures:
dbo.Locations:
[LocationID] [int] NOT NULL,
[LocationTypeID] [int] NOT NULL,
[Title] [varchar](100) NULL
dbo.Incoterms:
[IncotermsID] [int] IDENTITY(1,1) NOT NULL,
[Incoterm] [varchar](20) NOT NULL

The problem is most likely a column named LocationID in the Plants table, which is why the query is confused as to which LocationID column to be returned, from Plants or from #temptable?
As #GordonLinoff mentioned, it's a good practice (I'd say best practice) to always alias your tables used in joins or correlated queries, and do so for their associated columns as well.
The reason this only happens "sometimes" is because in your second query, there is a single IncotermsID exists in only one table of the two used in your CROSS APPLY.

SQL loop executes but new old values are over written

As my question title says, my program loops but all of my values I updated are being overwritten. Here's the code posted below. Say minRownum is 1 and max is 12, I see the loop execute 12 times correctly and min gets updated +1 each time. But in the end result, only the final row in my column whose RowNum is 12 have any values
I'm not exactly sure why overwriting is occurring since I'm saying "Update it where the rownumber = minrownumber" then I increment minrownum.
Can anyone point to what I am doing wrong? Thanks
WHILE (#MinRownum <= #MaxRownum)
BEGIN
print ' here'
UPDATE #usp_sec
set amount=(
SELECT sum(amount) as amount
FROM dbo.coverage
inner join dbo.owner
on coverage.api=owner.api
where RowNum=#MinRownum
);
SET #MinRownum = #MinRownum + 1
END
PS: I edited this line to say (below) and now every value has the same wrong number (its not distinct but duplicated to all.
set amount = (SELECT sum(amount) as amount
FROM dbo.coverage
INNER JOIN dbo.owner ON coverage.api = owner.api
where RowNum=#MinRownum
) WHERE RowNum = #MinRownum;
Tables:
CREATE TABLE dbo. #usp_sec
(
RowNum int,
amount numeric(20,2),
discount numeric(3,2)
)
CREATE TABLE [dbo].[handler](
[recordid] [int] IDENTITY(1,1) NOT NULL,
[covid] [varchar](25) NULL,
[ownerid] [char](10) NULL
)
CREATE TABLE [dbo].[coverage](
[covid] [varchar](25) NULL,
[api] [char](12) NULL,
[owncovid] [numeric](12, 0) NULL,
[amount] [numeric](14, 2) NULL,
[knote] [char](10) NULL
)
CREATE TABLE [dbo].[owner](
[api] [char](12) NOT NULL,
[owncovid] [numeric](12, 0) NULL,
[ownerid] [char](10) NOT NULL,
[officer] [char](20) NOT NULL,
[appldate] [date] NOT NULL
)

Your UPDATE statement needs its own WHERE clause. Otherwise, each UPDATE will update every row in the table.
And the way you have this written, your subquery still needs its WHERE clause too. In fact, you need to definitively correlate the subquery to your table's (#usp_sec) rows. We cannot tell you how that should be done without more information such as your table definitions.

How to write trigger to generate sequence number for repeated value

I need to generate sequence number for repeated value in a SQL Server table.
Here's my table
CREATE TABLE [dbo].[tbl_all_purple_flag_level](
[Sno] [int] IDENTITY(1,1) NOT NULL,
[Id] [varchar](50) NULL,
[Name] [varchar](50) NULL,
[Score] [varchar](50) NULL,
[Disability_Level] [varchar](50) NULL,
[visited_count] [varchar](50) NULL,
[Date] [varchar](50) NULL
)
If id has repeated numbers, I need visited_count to be a sequence number (1, 2, 3...)
I tried this code
SELECT
id,
RIGHT('0'+ CAST((ROW_NUMBER() OVER (Partition By id Order By id)) as varchar(2)), 2) as duplicate_id
FROM
tbl_all_purple_flag_level
It works fine but I don't know how to do in trigger. Can anyone help me? Thanks!

ALTER TRIGGER tr_After_Update_Inser
ON [dbo].[tbl_all_purple_flag_level]
FOR INSERT, UPDATE
AS
BEGIN
SET NOCOUNT ON;
WITH Updateable
AS
(
SELECT [Sno], id, [visited_count],
RIGHT('0'+ CAST((ROW_NUMBER() OVER (Partition By id Order By id, [Date] DESC)) as varchar(2)), 2) as duplicate_id
FROM tbl_all_purple_flag_level
)
UPDATE Updateable
SET [visited_count] = duplicate_id
END
One thing I would like to point out, SQL Server has Data types to store almost all kinds of data, I can see you are saving Dates in Varchar data type and IDs and Score all in varchar data type, Sql Server has Date data type to store date values. why would you use varchar for all the columns dont know but I think you should consider using appropriate data types for your column instead of using VARCHAR for everything.

Query is very very slow for processing 200000 plus records

I have 200,000 rows in Patient & Person table, and the query shown takes 30 secs to execute.
I have defined the primary key (and clustered index) in the Person table on PersonId and on PatientId in the Patient table. What else can I do here to improve performance of my procedure?
New to database development side. I know only basic SQL. Also not sure SQL Server can handle 200,000 rows quickly.
Whole dynamic Procedure you can see at https://github.com/Padayappa/SQLProblem/blob/master/Performance
Anyone faced handling huge rows like this? How do I improve performance here?
DECLARE #return_value int,
#unitRows bigint,
#unitPages int,
#TenantId int,
#unitItems int,
#page int
SET #TenantId = 1
SET #unitItems = 20
SET #page = 1
DECLARE #PatientSearch TABLE(
[PatientId] [bigint] NOT NULL,
[PatientIdentifier] [nvarchar](50) NULL,
[PersonNumber] [nvarchar](20) NULL,
[FirstName] [nvarchar](100) NOT NULL,
[LastName] [nvarchar](100) NOT NULL,
[ResFirstName] [nvarchar](100) NOT NULL,
[ResLastName] [nvarchar](100) NOT NULL,
[AddFirstName] [nvarchar](100) NOT NULL,
[AddLastName] [nvarchar](100) NOT NULL,
[Address] [nvarchar](255) NULL,
[City] [nvarchar](50) NULL,
[State] [nvarchar](50) NULL,
[ZipCode] [nvarchar](20) NULL,
[Country] [nvarchar](50) NULL,
[RowNumber] [bigint] NULL
)
INSERT INTO #PatientSearch SELECT PAT.PatientId
,PAT.PatientIdentifier
,PER.PersonNumber
,PER.FirstName
,PER.LastName
,RES_PER.FirstName AS ResFirstName
,RES_PER.LastName AS ResLastName
,ADD_PER.FirstName AS AddFirstName
,ADD_PER.LastName AS AddLastName
,PER.Address
,PER.City
,PER.State
,PER.ZipCode
,PER.Country
,ROW_NUMBER() OVER (ORDER BY PAT.PatientId DESC) AS RowNumber
FROM dbo.Patient AS PAT
INNER JOIN dbo.Person AS PER
ON PAT.PersonId = PER.PersonId
INNER JOIN dbo.Person AS RES_PER
ON PAT.ResponsiblePersonId = RES_PER.PersonId
INNER JOIN dbo.Person AS ADD_PER
ON PAT.AddedBy = ADD_PER.PersonId
INNER JOIN dbo.Booking AS B
ON PAT.PatientId = B.PatientId
WHERE PAT.TenantId = #TenantId AND B.CategoryId = #CategoryId
GROUP BY PAT.PatientId
,PAT.PatientIdentifier
,PER.PersonNumber
,PER.FirstName
,PER.LastName
,RES_PER.FirstName
,RES_PER.LastName
,ADD_PER.FirstName
,ADD_PER.LastName
,PER.Address
,PER.City
,PER.State
,PER.ZipCode
,PER.Country
;
SELECT #unitRows = ##ROWCOUNT
,#unitPages = (#unitRows / #unitItems) + 1;
SELECT *
FROM #PatientSearch AS IT
WHERE RowNumber BETWEEN (#page - 1) * #unitItems + 1 AND #unitItems * #page

Well, unless I am missing something (like duplicate rows?) you should be able to remove the GROUP BY
GROUP BY PAT.PatientId
,PAT.PatientIdentifier
,PER.PersonNumber
,PER.FirstName
,PER.LastName
,RES_PER.FirstName
,RES_PER.LastName
,ADD_PER.FirstName
,ADD_PER.LastName
,PER.Address
,PER.City
,PER.State
,PER.ZipCode
,PER.Country
as you are grouping by all fields in the select list, and you are partitioning by PAT.PatientId
Further to that, you should create index on the tables with the index containing columns that you join/filter on.
So for instance I would create an index on table Patient with columns (TenantId,PersonId,ResponsiblePersonId,AddedBy) with included columns (PatientId,PatientIdentifier)

Frankly speaking, 200,000 rows is nothing to SQL server. Please first remove logic redundancy, like you have primary key, why still group so many columns, and why you need to join same table (person) 3 times? After removing logic redundancy, you need to create some composite index/include index at least. Get the execution plan (CTRL+M) or (CTRL+M), to see what index you missed. If you need further help, please paste your table schema with few rows of sample data.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Subquery Performance - Non Unique Column in Where Clause - sql

UDFs are great in certain cases, but the execution plans don't get cached like sprocs do. If you try using another derived table instead of a function, the query might perform better.

Related

Searching 13 million records using full text search with additional conditions

Ambiguous column name... but only sometimes?

SQL loop executes but new old values are over written

How to write trigger to generate sequence number for repeated value

Query is very very slow for processing 200000 plus records

Categories

Resources