SQL Query performance with if exists - sql

I have this SQL query:
IF NOT EXISTS (SELECT TOP 1 RowId
FROM dbo.Cache AS C
WHERE StringSearched = #pcpnpi
AND colName = 'pcpnpi'
AND ModifiedAt > (SELECT ModifiedAt
FROM dbo.Patients AS p
WHERE P.RowID = C.RowID))
BEGIN
SELECT #constVal = FunctionWeight
FROM dbo.FunctionWeights
WHERE FunctionWeights.FunctionId = 33;
INSERT INTO #Temp2
(RowNumber,ValFromUser,ColumnName,ValFromFunc,
FuncWeight,percentage)
SELECT RowNumber,#pcpnpi,'pcpnpi',PercentMatch,
#constVal,PercentMatch * #constVal
FROM dbo.Matchpcpnpi (#pcpnpi);
END
ELSE
BEGIN
INSERT INTO #Temp2
(RowNumber,ValFromUser,ColumnName,Percentage)
SELECT RowId,StringSearched,ColName,PercentMatch
FROM dbo.Cache AS C
WHERE StringSearched = #pcpnpi
AND colName = 'pcpnpi'
AND ModifiedAt > (SELECT ModifiedAt
FROM dbo.Patients AS p
WHERE P.RowID = C.RowID)
END
The above if statement is meant to avoid unnecessary look ups for strings that have already been searched earlier and MatchPercent has been calculated. In that case, it is directly retrieved from Cache table.
Above sql query is basically for one particular column and this same kind of query with just columnName and its value changing is repeated for many other columns in the procedure.
The if Exists check was obviously meant so that query performance could improve however the performance has gone down, probably because of extra checks.
Cache table which is actually meant to improve the performance, extra checks have ruined it.
Is there a way to simplify above query,please ? Any directions on same will help.
Thanks

First insert into #temp2 based on exists condition. If the Insert record count is zero then do the another insert. Try this.
INSERT INTO #Temp2
(RowNumber,ValFromUser,ColumnName,Percentage)
SELECT RowId,StringSearched,ColName,PercentMatch
FROM dbo.Cache AS C
WHERE StringSearched = #pcpnpi
AND colName = 'pcpnpi'
AND ModifiedAt > (SELECT ModifiedAt
FROM dbo.Patients AS p
WHERE P.RowID = C.RowID)
IF ##ROWCOUNT = 0
BEGIN
SELECT #constVal = FunctionWeight
FROM dbo.FunctionWeights
WHERE FunctionWeights.FunctionId = 33;
INSERT INTO #Temp2
(RowNumber,ValFromUser,ColumnName,ValFromFunc,
FuncWeight,percentage)
SELECT RowNumber,#pcpnpi,'pcpnpi',PercentMatch,
#constVal,PercentMatch * #constVal
FROM dbo.Matchpcpnpi (#pcpnpi)
END

First, consider this query in the exists:
select Top 1 RowId
from dbo.Cache as C
where StringSearched = #pcpnpi and
colName = 'pcpnpi' and
ModifiedAt > ( Select ModifiedAt FROM dbo.Patients p WHERE P.RowID = C.RowID))
For performance, you want indexes on cache(StringSearched, colName, ModifiedAt, RowId) and Patients(RowId).
However, you are running this query twice. I would suggest a structure more like:
declare #RowId . . . ; -- I don't know the type
select Top 1 #RowId = RowId
from dbo.Cache as C
where StringSearched = #pcpnpi and
colName = 'pcpnpi' and
ModifiedAt > ( Select ModifiedAt FROM dbo.Patients p WHERE P.RowID = C.RowID));
if (#RowId) is null . ..
else . . .

Related

SQL Loop through tables and columns to find which columns are NOT empty

I created a temp table #test containing 3 fields: ColumnName, TableName, and Id.
I would like to see which rows in the #test table (columns in their respective tables) are not empty? I.e., for every column name that i have in the ColumnName field, and for the corresponding table found in the TableName field, i would like to see whether the column is empty or not. Tried some things (see below) but didn't get anywhere. Help, please.
declare #LoopCounter INT = 1, #maxloopcounter int, #test varchar(100),
#test2 varchar(100), #check int
set #maxloopcounter = (select count(TableName) from #test)
while #LoopCounter <= #maxloopcounter
begin
DECLARE #PropIDs TABLE (tablename varchar(max), id int )
Insert into #PropIDs (tablename, id)
SELECT [tableName], id FROM #test
where id = #LoopCounter
set #test2 = (select columnname from #test where id = #LoopCounter)
declare #sss varchar(max)
set #sss = (select tablename from #PropIDs where id = #LoopCounter)
set #check = (select count(#test2)
from (select tablename
from #PropIDs
where id = #LoopCounter) A
)
print #test2
print #sss
print #check
set #LoopCounter = #LoopCounter + 1
end
In order to use variables as column names and table names in your #Check= query, you will need to use Dynamic SQL.
There is most likely a better way to do this but I cant think of one off hand. Here is what I would do.
Use the select and declare a cursor rather than a while loop as you have it. That way you dont have to count on sequential id's. The cursor would fetch fields columnname, id and tablename
In the loop build a dynamic sql statement
Set #Sql = 'Select Count(*) Cnt Into #Temp2 From ' + TableName + ' Where ' + #columnname + ' Is not null And ' + #columnname <> '''''
Exec(#Sql)
Then check #Temp2 for a value greater than 0 and if this is what you desire you can use the #id that was fetched to update your #Temp table. Putting the result into a scalar variable rather than a temp table would be preferred but cant remember the best way to do that and using a temp table allows you to use an update join so it would well in my opinion.
https://www.mssqltips.com/sqlservertip/1599/sql-server-cursor-example/
http://www.sommarskog.se/dynamic_sql.html
Found a way to extract all non-empty tables from the schema, then just joined with the initial temp table that I had created.
select A.tablename, B.[row_count]
from (select * from #test) A
left join
(SELECT r.table_name, r.row_count, r.[object_id]
FROM sys.tables t
INNER JOIN (
SELECT OBJECT_NAME(s.[object_id]) table_name, SUM(s.row_count) row_count, s.[object_id]
FROM sys.dm_db_partition_stats s
WHERE s.index_id in (0,1)
GROUP BY s.[object_id]
) r on t.[object_id] = r.[object_id]
WHERE r.row_count > 0 ) B
on A.[TableName] = B.[table_name]
WHERE ROW_COUNT > 0
order by b.row_count desc
How about this one - bitmask computed column checks for NULLability. Value in the bitmask tells you if a column is NULL or not. Counting base 2.
CREATE TABLE FindNullComputedMask
(ID int
,val int
,valstr varchar(3)
,NotEmpty as
CASE WHEN ID IS NULL THEN 0 ELSE 1 END
|
CASE WHEN val IS NULL THEN 0 ELSE 2 END
|
CASE WHEN valstr IS NULL THEN 0 ELSE 4 END
)
INSERT FindNullComputedMask
SELECT 1,1,NULL
INSERT FindNullComputedMask
SELECT NULL,2,NULL
INSERT FindNullComputedMask
SELECT 2,NULL, NULL
INSERT FindNullComputedMask
SELECT 3,3,3
SELECT *
FROM FindNullComputedMask

Add/Skip WHERE CLAUSE based on Condition

I have the below query that takes a TagId list from table variable and returns the list.
But I need to add that CategoryId WHERE condition only if #Tags has the records.
Is it possible to add a WHERE Condition only if my table variable has records otherwise run the same query with 1=1(Always true) and skip the category filter?
DECLARE #TagIdList NVARCHAR(100) = '22,25,47'
DECLARE #Tags TABLE (TagId INT);
WITH CSVtoTable
AS (
SELECT CAST('<XMLRoot><RowData>' + REPLACE(t.val, ',', '</RowData><RowData>') + '</RowData></XMLRoot>' AS XML) AS x
FROM (
SELECT #TagIdList
) AS t(val)
)
INSERT INTO #Tags (TagId)
SELECT m.n.value('.[1]', 'varchar(8000)') AS TagId
FROM CSVtoTable
CROSS APPLY x.nodes('/XMLRoot/RowData') m(n)
SELECT BookingId
,C.CategoryName
FROM Booking B
INNER JOIN Category C ON C.CategoryId = B.CategoryId
WHERE (
b.IsDeleted = 0
OR b.IsDeleted IS NULL
)
-- Add the below where condition only if #Tags has records, else use 1=1
AND C.CategoryId IN (
SELECT DISTINCT CategoryId
FROM CategoryXTag con
WHERE TagId IN (
SELECT TagId
FROM #Tags
)
)
Ultimately you only need to change the end of your query. If performance is an issue you might want to consider using two branches of an if block for each of the two cases even though it's technically possible to squeeze the logic into a single query that doesn't generally optimize as well.
AND
(
C.CategoryId IN (
SELECT CategoryId
FROM CategotryXTag
WHERE TagId IN (
SELECT TagId
FROM #Tags
)
)
OR
(SELECT COUNT(*) FROM #Tags) = 0
)
declare int #tagcount = (select count(*) from #Tags);
SELECT BookingId, C.CategoryName
FROM Booking B
INNER JOIN Category C
ON C.CategoryId = B.CategoryId
AND isnull(b.IsDeleted, 0) = 0
INNER JOIN CategoryXTag con
ON C.CategoryId = con.CategoryId
INNER JOIN #Tags tags
ON tags.TagID = con.TagID
OR #tagcount = 0;
if #tags is empty you might need to put one record in it with a value that would never by used and then or that value
if(#tagcount = 0) insert into #tags values (-100);
or tags.TagID = -100;
You don't need to modify your where clause. Instead, you achieve the same logic by filling #Tags with every TagId from CategoryXTag before running your final query if #Tags is empty after the initial insert:
if ((select count(*) from #Tags) = 0)
insert into #Tags
select distinct TagId
from CategoryXTag;
I'd declare a variable for the #Tags table:
declare #needTagsFilter bit
set #needTagsFilter = case when exists(select 1 from #Tags) then 1 else 0 end
and change the where clause like
AND (
(#needTagsFilter = 0) OR
(C.CategoryId IN (
SELECT DISTINCT CategoryId
FROM CategoryXTag con
WHERE TagId IN (
SELECT TagId
FROM #Tags
)
)
)
COUNT(*) is slower then exists. The downside of adding the count/exists directly to your original query is that SQL server might execute it for all rows.

writing a correlated subquery in sql [duplicate]

I have two tables: Table A and Table B
Table A and Table B both have RowId column.
Table A and Table B both have ModifiedAt column.
Also Table A has a column called Key.
Check conditions :
Retrieve RowId's from table A if table A 'Key' = someconstant
Take those retrieved row Id's from Table A and check if ModifiedAt field of those rows is > ModifiedAT field of Table B with same rowId's.
Table B has no repetition of RowId's but Table A does.
What I tried on my own :
select *
from dbo.ResultsStored rs
WHERE HashedKey = hashbytes('MD5', #StringConcat)
and
rs.ModifiedAT > (select Max(ModifiedAt)
from dbo.Patients P
where P.RowId = rs.RowId)
Note :
Also , what surprises me is if I replace rs.RowId with hardcoded value say '1', it works but not this way.
Results when I hardcode rs.RowId :
if not exists (select * from dbo.ResultsStored RS where RS.HashedKey = 0xBBE4D4DC92C713756E6683ADD671F7DA and ModifiedAt > (select ModifiedAt from dbo.Patients where RowId = 1))
begin
print'not exists'
end
else
begin
print 'exists'
end
OUTPUT : not exists
if not exists (select * from dbo.ResultsStored RS where RS.HashedKey = 0xBBE4D4DC92C713756E6683ADD671F7DA and ModifiedAt > (select ModifiedAt from dbo.Patients where RowId = rs.RowId))
begin
print'not exists'
end
else
begin
print 'exists'
end
OUTPUT : exists
Expected output : not exists
Can I please get some help on this ?
The problem is in your data.
If I understand correctly you want to know if there are such rows in Results which date is greater then Patients date. If no such row is found then it is OK.
If so your query looks correct. You can directly select incorrect data by:
SELECT *
FROM Patients p
CROSS APPLY ( SELECT MAX(ModifiedAt) AS ModifiedAt
FROM ResultsStored rs
WHERE p.RowId = rs.RowId
) a
WHERE a.ModifiedAt > p.ModifiedAt
DECLARE #RowId INT
DECLARE CurRowId CURSOR FOR SELECT RowId FROM Patients
OPEN CurRowId
FETCH NEXT FROM CurRowId INTO #RowId
WHILE ##FETCH_STATUS = 0
BEGIN
if not exists (select * from dbo.ResultsStored where ModifiedAt >
(select ModifiedAt from dbo.Patients where RowId =
#RowId))
begin
print'not exists'
end
else
begin
print 'exists'
END
FETCH NEXT FROM CurRowId INTO #RowId
END

Quicker way to update all rows in a SQL Server table

Is there a more efficient way to write this code? Or with less code?
SELECT *
INTO #Temp
FROM testtemplate
Declare #id INT
Declare #name VARCHAR(127)
WHILE (SELECT Count(*) FROM #Temp) > 0
BEGIN
SELECT TOP 1 #id = testtemplateid FROM #Temp
SELECT TOP 1 #name = name FROM #Temp
UPDATE testtemplate
SET testtemplate.vendortestcode = (SELECT test_code FROM test_code_lookup WHERE test_name = #name)
WHERE testtemplateid = #id
--finish processing
DELETE #Temp Where testtemplateid = #id
END
DROP TABLE #Temp
You can do this in a single UPDATE with no need to loop.
UPDATE tt
SET vendortestcode = tcl.test_code
FROM testtemplate tt
INNER JOIN test_code_lookup tcl
ON tt.name = tcl.test_name
You could try a single update like this:
UPDATE A
SET A.vendortestcode = B.test_code
FROM testtemplate A
INNER JOIN test_code_lookup B
ON A.name = B.test_name
Also, the way you are doing it now is wrong, since you are taking a TOP 1 Id and a TOP 1 name in two separate querys, without an ORDER BY, so its not sure that you are taking the right name for your ID.
You could write a function to update vendortestcode. Then your code reduces to one SQL statement:
update testtemplate set vendortestcode = dbo.get_test_code_from_name(name)

How can I efficiently do a database massive update?

I have a table with some duplicate entries. I have to discard all but one, and then update this latest one. I've tried with a temporary table and a while statement, in this way:
CREATE TABLE #tmp_ImportedData_GenericData
(
Id int identity(1,1),
tmpCode varchar(255) NULL,
tmpAlpha3Code varchar(50) NULL,
tmpRelatedYear int NOT NULL,
tmpPreviousValue varchar(255) NULL,
tmpGrowthRate varchar(255) NULL
)
INSERT INTO #tmp_ImportedData_GenericData
SELECT
MCS_ImportedData_GenericData.Code,
MCS_ImportedData_GenericData.Alpha3Code,
MCS_ImportedData_GenericData.RelatedYear,
MCS_ImportedData_GenericData.PreviousValue,
MCS_ImportedData_GenericData.GrowthRate
FROM MCS_ImportedData_GenericData
INNER JOIN
(
SELECT CODE, ALPHA3CODE, RELATEDYEAR, COUNT(*) AS NUMROWS
FROM MCS_ImportedData_GenericData AS M
GROUP BY M.CODE, M.ALPHA3CODE, M.RELATEDYEAR
HAVING count(*) > 1
) AS M2 ON MCS_ImportedData_GenericData.CODE = M2.CODE
AND MCS_ImportedData_GenericData.ALPHA3CODE = M2.ALPHA3CODE
AND MCS_ImportedData_GenericData.RELATEDYEAR = M2.RELATEDYEAR
WHERE
(MCS_ImportedData_GenericData.PreviousValue <> 'INDEFINITO')
-- SELECT * from #tmp_ImportedData_GenericData
-- DROP TABLE #tmp_ImportedData_GenericData
DECLARE #counter int
DECLARE #rowsCount int
SET #counter = 1
SELECT #rowsCount = count(*) from #tmp_ImportedData_GenericData
-- PRINT #rowsCount
WHILE #counter < #rowsCount
BEGIN
SELECT
#Code = tmpCode,
#Alpha3Code = tmpAlpha3Code,
#RelatedYear = tmpRelatedYear,
#OldValue = tmpPreviousValue,
#GrowthRate = tmpGrowthRate
FROM
#tmp_ImportedData_GenericData
WHERE
Id = #counter
DELETE FROM MCS_ImportedData_GenericData
WHERE
Code = #Code
AND Alpha3Code = #Alpha3Code
AND RelatedYear = #RelatedYear
AND PreviousValue <> 'INDEFINITO' OR PreviousValue IS NULL
UPDATE
MCS_ImportedData_GenericData
SET
PreviousValue = #OldValue, GrowthRate = #GrowthRate
WHERE
Code = #Code
AND Alpha3Code = #Alpha3Code
AND RelatedYear = #RelatedYear
AND MCS_ImportedData_GenericData.PreviousValue ='INDEFINITO'
SET #counter = #counter + 1
END
but it takes too long time, even if there are just 20000 - 30000 rows to process.
Does anyone has some suggestions in order to improve performance?
Thanks in advance!
WITH q AS (
SELECT m.*, ROW_NUMBER() OVER (PARTITION BY CODE, ALPHA3CODE, RELATEDYEAR ORDER BY CASE WHEN PreviousValue = 'INDEFINITO' THEN 1 ELSE 0 END)
FROM MCS_ImportedData_GenericData m
WHERE PreviousValue <> 'INDEFINITO'
)
DELETE
FROM q
WHERE rn > 1
Quassnoi's answer uses SQL Server 2005+ syntax, so I thought I'd put in my tuppence worth using something more generic...
First, to delete all the duplicates, but not the "original", you need a way of differentiating the duplicate records from each other. (The ROW_NUMBER() part of Quassnoi's answer)
It would appear that in your case the source data has no identity column (you create one in the temp table). If that is the case, there are two choices that come to my mind:
1. Add the identity column to the data, then remove the duplicates
2. Create a "de-duped" set of data, delete everything from the original, and insert the de-deduped data back into the original
Option 1 could be something like...
(With the newly created ID field)
DELETE
[data]
FROM
MCS_ImportedData_GenericData AS [data]
WHERE
id > (
SELECT
MIN(id)
FROM
MCS_ImportedData_GenericData
WHERE
CODE = [data].CODE
AND ALPHA3CODE = [data].ALPHA3CODE
AND RELATEDYEAR = [data].RELATEDYEAR
)
OR...
DELETE
[data]
FROM
MCS_ImportedData_GenericData AS [data]
INNER JOIN
(
SELECT
MIN(id) AS [id],
CODE,
ALPHA3CODE,
RELATEDYEAR
FROM
MCS_ImportedData_GenericData
GROUP BY
CODE,
ALPHA3CODE,
RELATEDYEAR
)
AS [original]
ON [original].CODE = [data].CODE
AND [original].ALPHA3CODE = [data].ALPHA3CODE
AND [original].RELATEDYEAR = [data].RELATEDYEAR
AND [original].id <> [data].id
I don't understand used syntax perfectly enough to post an exact answer, but here's an approach.
Identify rows you want to preserve (eg. select value, ... from .. where ...)
Do the update logic while identifying (eg. select value + 1 ... from ... where ...)
Do insert select to a new table.
Drop the original, rename new to original, recreate all grants/synonyms/triggers/indexes/FKs/... (or truncate the original and insert select from the new)
Obviously this has a prety big overhead, but if you want to update/clear millions of rows, it will be the fastest way.