how do I remove nested select statement

how do I remove nested select statement - sql

I have a Name table with the columns
NameID
Name
TypeID
With the following SQL
SELECT[NameID]
FROM[Name]
WHERE[TypeID] = #TypeID
AND NameID >= (SELECT MIN([NameID])
FROM [Name]
WHERE [Name]='Billy' AND [TypeID]=#TypeID)
Ive been asked to convert this to an Inner Join without using any nested select but not sure how to.
thanks for your help!

Originally I didn't think you needed a join at all,
;WITH n AS
(
SELECT
NameID,
rn = ROW_NUMBER() OVER (ORDER BY NameID)
FROM [Name]
WHERE TypeID = #TypeID
AND [Name] = 'Billy'
)
SELECT NameID
FROM n
WHERE rn > 1;
Then again, maybe I do not have the requirements clear. What is the purpose of this query?
SELECT n1.NameID
FROM [Name] AS n1
INNER JOIN
(
SELECT NameID = MIN(NameID)
FROM [Name]
WHERE TypeID = #TypeID
AND [Name] = 'Billy'
) AS n2
ON n1.NameID >= n2.NameID
WHERE n1.TypeID = #TypeID;
I agree with Lukas, I am not sure why the person who is telling you to change this thinks an inner join will be better than your original.

You could remove the nested part via: -
declare #NameID int
select #NameID = (SELECT MIN([NameID])
FROM [Name] WHERE [Name]='Billy' AND
[TypeID]=#TypeID)
SELECT [NameID] FROM [Name] WHERE [TypeID]
= #TypeID AND NameID >= #NameID
But as stated already, this does not provide any performance benefit as the subquery would only be evaluated once in your version, the same as in this.

Well, it looks like just moving the condition [Name]='Billy' should produce the same result for this specific query. So convert your original:
SELECT[NameID]
FROM[Name]
WHERE[TypeID] = #TypeID
AND NameID >= (SELECT MIN([NameID])
FROM [Name]
WHERE [Name]='Billy' AND [TypeID]=#TypeID)
to:
SELECT[NameID]
FROM[Name]
WHERE[TypeID] = #TypeID
AND[Name]='Billy'

Related

IF EXISTS in WHERE clause

I ran into a problem this days. In my MS SQL Database I have a table of articles (details of them like insert date, insert user and other staffs) and one table with the body of articles in multiple languages. I want that the articles body to be in user preferred language. But not all the articles are in all languages. So, would be nice, first to search for the article in user's preferred language and, if not exists, to get the body in first language it is.
For this, I use a function. The WHERE clause is like this:
WHERE [tblBody].[Language] = #Language AND
[tblTitle].[Language] = #Language
but would be nice if it would be like this:
SELECT (if body is in my preferred language get that body; else give me one (maybe I understood it))
I hope you understood what I want to do and witch are my problems.
Thank you in advance!
UPDATE
This is my actual query which needs modifications:
ALTER FUNCTION [fx_GetNews]
(
#MinimumPermission INT,
#UserID UNIQUEIDENTIFIER,
#OwnerID UNIQUEIDENTIFIER = NULL,
#Title NVARCHAR(250) = NULL,
#Body VARCHAR(150) = NULL,
#InsertDateStart DATETIME = NULL,
#InsertDateEnd DATETIME = NULL,
#CreatedByUserID UNIQUEIDENTIFIER = NULL,
#ExpirationDate DATETIME = NULL,
#Language VARCHAR(150)
)
RETURNS #News TABLE
(
[Id] UNIQUEIDENTIFIER
,[OwnerID] UNIQUEIDENTIFIER
,[Title] NVARCHAR(250)
,[TitlePictureUrl] VARCHAR(150)
,[Body] NVARCHAR(max)
,[Visible] BIT
,[InsertDate] DATETIME
,[CreatedByUserId] UNIQUEIDENTIFIER
,[ModifiedDate] DATETIME
,[ModifiedByUserId] UNIQUEIDENTIFIER
,[ModifiedByPerson] VARCHAR(250)
,[ExpirationDate] DATETIME
,[CreatedByPerson] VARCHAR(250)
,[Permission] INT
)
AS
BEGIN
INSERT INTO #News
([Id]
,[OwnerID]
,[Title]
,[TitlePictureUrl]
,[Body]
,[Visible]
,[InsertDate]
,[CreatedByUserId]
,[ModifiedDate]
,[ModifiedByUserId]
,[CreatedByPerson]
,[ModifiedByPerson]
,[ExpirationDate]
,[Permission])
SELECT
[dbo].[tblNews].[Id]
,[dbo].[tblNews].[OwnerID]
,CAST([tblTitle].[Text] AS VARCHAR(150))
,[dbo].[tblNews].[TitlePictureUrl]
,[tblBody].[Text]
,[dbo].[tblNews].[Visible]
,[dbo].[tblNews].[InsertDate]
,[dbo].[tblNews].[CreatedByUserId]
,[dbo].[tblNews].[ModifiedDate]
,[dbo].[tblNews].[ModifiedByUserId]
,[dbo].[tblNews].[CreatedByPerson]
,[dbo].[tblNews].[ModifiedByPerson]
,[dbo].[tblNews].[ExpirationDate]
,[eportofolii].[fx_GetPermissionForObject] (#UserID, [dbo].[tblNews].[ID], 1, 1)
FROM [dbo].[tblNews]
INNER JOIN [dbo].[tblAdmTranslateText] AS [tblBody] ON
[tblBody].[OwnerID] = [dbo].[tblNews].[ID]
INNER JOIN [dbo].[tblAdmTranslateText] AS [tblTitle] ON
[tblTitle].[OwnerID] = [dbo].[tblNews].[ID]
WHERE
[eportofolii].[fx_GetPermissionForObject] (#UserID, [dbo].[tblNews].[ID], 1, 1) >= #MinimumPermission AND
([dbo].[tblNews].[OwnerID] = ISNULL(#OwnerID, [dbo].[tblNews].[OwnerID]) OR [dbo].[tblNews].[OwnerID] IS NULL) AND
[tblTitle].[Text] LIKE '%' + ISNULL(#Title, '') + '%' AND
[tblBody].[Text] LIKE '%' + ISNULL(#Body, '') + '%' AND
([dbo].[tblNews].[InsertDate] BETWEEN ISNULL(#InsertDateStart, ([dbo].[tblNews].[InsertDate] - 7)) AND ISNULL(#InsertDateEnd, [dbo].[tblNews].[InsertDate] + 1)) AND
[dbo].[tblNews].[CreatedByUserID] = ISNULL(#CreatedByUserID, [dbo].[tblNews].[CreatedByUserID]) AND
([dbo].[tblNews].[ExpirationDate] > ISNULL(#ExpirationDate, GETDATE() - 1) OR [dbo].[tblNews].[ExpirationDate] IS NULL) AND
[tblBody].[UDF_2] = 'Body' AND
[tblTitle].[UDF_2] = 'Title' AND
[tblBody].[UDF_1] = 'News' AND
[tblTitle].[UDF_1] = 'News' AND
[tblBody].[Language] = #Language AND
[tblTitle].[Language] = #Language
ORDER BY [InsertDate] DESC
RETURN
END
The problem is here at the end:
[tblBody].[Language] = #Language AND
[tblTitle].[Language] = #Language
Thank you!

;WITH x AS
(
SELECT b.[Language], /* other columns from b */
rn = ROW_NUMBER() OVER (PARTITION BY b.article_id_of_some_kind
ORDER BY CASE WHEN t.[Language] = #Language THEN 1 ELSE 2 END)
FROM dbo.tblBody AS b
INNER JOIN dbo.tblLanguage AS l
ON b.LanguageID = l.LanguageID -- guessing here
)
SELECT /* cols */ FROM x WHERE rn = 1;
This will return an arbitrary language if the preferred language is not available. You can further refine that by modifying the inner ORDER BY.

What you want to do is a bit complicated. You want to return articles in the desired language, if present, and in another language if not.
For this, a window function is quite useful:
select t.*
from (select blah,
max(case when [tblBody].[Language] = #Language then 1 else 0 end) over (partition by article) as HasLanguage
from whatever
where [tblTitle].[Language] = #Language
) t
where HasLanguage = 1 or [tblBody].[Language] <> #Language

select B.*
from bodies as B
where exists (
select 1 from (
select b.article, coalesce( u.language, b.language ) as language
from bodies as b
left join users as u
on b.language = u.language
and u.user = #user
where b.article = #article
and b.default_language = 'TRUE' -- or whatever
) as A
on A.article = B.Article
and A.language = B.language
)
You have to choose between two languages: the user's language and the article's language. you could use UNION or coalesce (because an outer join is a kind of a union), and the latter is simpler.

One option is to use the MSSQL "COALESCE" function:
http://msdn.microsoft.com/en-us/library/ms190349.aspx
Another might be to use a CASE block.

SQL select count of all rows when using IF ELSE

I have a situation when I use IF ELSE statement in SQL.
#searchString nvarchar(50),
#languageId int,
#count int,
#id int
IF(#count IS NOT NULL)
SELECT TOP (#count) Id, value
FROM TABLE1
WHERE (Id IN
(SELECT Id
FROM TABLE2
WHERE (Id = #id))) and languageId = #languageId AND value LIKE '%' + #searchString + '%'
ORDER BY value
ELSE
SELECT Id, value
FROM TABLE1
WHERE (Id IN
(SELECT Id
FROM TABLE2
WHERE (Id = #id))) and languageId = #languageId AND value LIKE '%' + #searchString + '%'
ORDER BY value
I would like to return number of all rows using
count(*) over() (or something similar)
(as I return only TOP count records for now), like it is answered here:
How to return total number of records with TOP * select
BUT: I wouldn't return this value for every instance, but I would like to return count just once.
Is there a way to do this with one query, or I have to write a separate query for this?
Any hint would be greatly appreciated!
EDIT: using SQL server 2008 r2.

I suggest you to use INNER JOIN, for Exemple with scripts:
{
SCRIPT
SELECT
l.localite_id,
p.patient_id,
p.patient_date_naissance,
p.patient_gsm, p.patient_tel
FROM
patient p
INNER JOIN localite l
ON l.localite_id = p.localite_id
INNER JOIN ville v
ON v.ville_id = l.ville_id
INNER JOIN gouvernorat g
ON v.gouv_id = g.gouv_id
INNER JOIN pays pays
ON pays.pays_id = g.pays_id
IF NEQ #patient_gsm# -1
WHERE p.patient_gsm like #patient_gsm#
AND p.patient_code like #patient_code#
ENDIF
END
}

Modify FindAll function to a DoesExist function in SQL Server

I have the following recursive function:
ALTER FUNCTION [dbo].[ListAncestors]
(
#Id int
)
RETURNS TABLE
As
RETURN
(
WITH cte As
(
SELECT
UserId,
ManagerId,
Forename,
Surname
FROM
dbo.Users
WHERE
UserId = #Id
UNION ALL
SELECT
T.UserID,
T.ManagerID,
T.Forename,
T.Surname
FROM
cte As C INNER JOIN dbo.Users As T
ON C.UserID = T.ManagerID
)
SELECT
Forename,
Surname
FROM
cte
);
Basically what it does is returns the names of all users below the specified user (based on their ID). What I would like to do is modify this function and create another function which does a check if a specific userID is an ancestor of another.
I imagine the signature would look something like:
CREATE FUNCTION IsAncestor(#Id int, #AncestorId int) RETURNS BIT

How about:
WHILE #Id IS NOT NULL AND #Id <> #AncestorId
BEGIN
SET #Id = (
SELECT ManagerId FROM dbo.Users WHERE UserId = #Id
)
END
RETURN CASE WHEN #Id IS NOT NULL THEN 1 ELSE 0 END

If we accept that the initial CTE takes an ID and lists all the 'ancestors' of that ID, I think that the following query tests for this relation.
WITH cte As
(
SELECT
UserId,
Forename,
Surname
FROM
dbo.Users
WHERE
UserId = #Id
UNION ALL
SELECT
T.UserID,
T.Forename,
T.Surname
FROM
cte As C INNER JOIN dbo.Users As T
ON C.UserID = T.ManagerID and C.UserID <> #ancestorID
)
SELECT CAST (COUNT(*) as BIT) FROM cte WHERE UserID = #ancestorID
It's a bit odd though, since given the initial function a person is in the 'ancestor' relation with themselves.
Incidentally, I removed the ManagerID from the select statements in the CTE since it isn't necessary

Stored procedure optimization

i have a stored procedure which takes lot of time to execure .Can any one suggest a better approch so that the same result set is achived.
ALTER PROCEDURE [dbo].[spFavoriteRecipesGET]
#USERID INT, #PAGENUMBER INT, #PAGESIZE INT, #SORTDIRECTION VARCHAR(4), #SORTORDER VARCHAR(4),#FILTERBY INT
AS
BEGIN
DECLARE
#ROW_START INT
DECLARE
#ROW_END INT
SET
#ROW_START = (#PageNumber-1)* #PageSize+1
SET
#ROW_END = #PageNumber*#PageSize
DECLARE
#RecipeCount INT
DECLARE
#RESULT_SET_TABLE
TABLE
(
Id INT NOT NULL IDENTITY(1,1),
FavoriteRecipeId INT,
RecipeId INT,
DateAdded DATETIME,
Title NVARCHAR(255),
UrlFriendlyTitle NVARCHAR(250),
[Description] NVARCHAR(MAX),
AverageRatingId FLOAT,
SubmittedById INT,
SubmittedBy VARCHAR(250),
RecipeStateId INT,
RecipeRatingId INT,
ReviewCount INT,
TweaksCount INT,
PhotoCount INT,
ImageName NVARCHAR(50)
)
INSERT INTO #RESULT_SET_TABLE
SELECT
FavoriteRecipes.FavoriteRecipeId,
Recipes.RecipeId,
FavoriteRecipes.DateAdded,
Recipes.Title,
Recipes.UrlFriendlyTitle,
Recipes.[Description],
Recipes.AverageRatingId,
Recipes.SubmittedById,
COALESCE(users.DisplayName,users.UserName,Recipes.SubmittedBy) As SubmittedBy,
Recipes.RecipeStateId,
RecipeReviews.RecipeRatingId,
COUNT(RecipeReviews.Review),
COUNT(RecipeTweaks.Tweak),
COUNT(Photos.PhotoId),
dbo.udfGetRecipePhoto(Recipes.RecipeId) AS ImageName
FROM
FavoriteRecipes
INNER JOIN Recipes ON FavoriteRecipes.RecipeId=Recipes.RecipeId AND Recipes.RecipeStateId <> 3
LEFT OUTER JOIN RecipeReviews ON RecipeReviews.RecipeId=Recipes.RecipeId AND RecipeReviews.ReviewedById=#UserId
AND RecipeReviews.RecipeRatingId= (
SELECT MAX(RecipeReviews.RecipeRatingId)
FROM RecipeReviews
WHERE RecipeReviews.ReviewedById=#UserId
AND RecipeReviews.RecipeId=FavoriteRecipes.RecipeId
)
OR RecipeReviews.RecipeRatingId IS NULL
LEFT OUTER JOIN RecipeTweaks ON RecipeTweaks.RecipeId = Recipes.RecipeId AND RecipeTweaks.TweakedById= #UserId
LEFT OUTER JOIN Photos ON Photos.RecipeId = Recipes.RecipeId
AND Photos.UploadedById = #UserId AND Photos.RecipeId = FavoriteRecipes.RecipeId
AND Photos.PhotoTypeId = 1
LEFT OUTER JOIN users ON Recipes.SubmittedById = users.UserId
WHERE
FavoriteRecipes.UserId=#UserId
GROUP BY
FavoriteRecipes.FavoriteRecipeId,
Recipes.RecipeId,
FavoriteRecipes.DateAdded,
Recipes.Title,
Recipes.UrlFriendlyTitle,
Recipes.[Description],
Recipes.AverageRatingId,
Recipes.SubmittedById,
Recipes.SubmittedBy,
Recipes.RecipeStateId,
RecipeReviews.RecipeRatingId,
users.DisplayName,
users.UserName,
Recipes.SubmittedBy;
WITH SortResults
AS
(
SELECT
ROW_NUMBER() OVER (
ORDER BY CASE WHEN #SORTDIRECTION = 't' AND #SORTORDER='a' THEN TITLE END ASC,
CASE WHEN #SORTDIRECTION = 't' AND #SORTORDER='d' THEN TITLE END DESC,
CASE WHEN #SORTDIRECTION = 'r' AND #SORTORDER='a' THEN AverageRatingId END ASC,
CASE WHEN #SORTDIRECTION = 'r' AND #SORTORDER='d' THEN AverageRatingId END DESC,
CASE WHEN #SORTDIRECTION = 'mr' AND #SORTORDER='a' THEN RecipeRatingId END ASC,
CASE WHEN #SORTDIRECTION = 'mr' AND #SORTORDER='d' THEN RecipeRatingId END DESC,
CASE WHEN #SORTDIRECTION = 'd' AND #SORTORDER='a' THEN DateAdded END ASC,
CASE WHEN #SORTDIRECTION = 'd' AND #SORTORDER='d' THEN DateAdded END DESC
) RowNumber,
FavoriteRecipeId,
RecipeId,
DateAdded,
Title,
UrlFriendlyTitle,
[Description],
AverageRatingId,
SubmittedById,
SubmittedBy,
RecipeStateId,
RecipeRatingId,
ReviewCount,
TweaksCount,
PhotoCount,
ImageName
FROM
#RESULT_SET_TABLE
WHERE
((#FILTERBY = 1 AND SubmittedById= #USERID)
OR ( #FILTERBY = 2 AND (SubmittedById <> #USERID OR SubmittedById IS NULL))
OR ( #FILTERBY <> 1 AND #FILTERBY <> 2))
)
SELECT
RowNumber,
FavoriteRecipeId,
RecipeId,
DateAdded,
Title,
UrlFriendlyTitle,
[Description],
AverageRatingId,
SubmittedById,
SubmittedBy,
RecipeStateId,
RecipeRatingId,
ReviewCount,
TweaksCount,
PhotoCount,
ImageName
FROM
SortResults
WHERE
RowNumber BETWEEN #ROW_START AND #ROW_END
print #ROW_START
print #ROW_END
SELECT
#RecipeCount=dbo.udfGetFavRecipesCount(#UserId)
SELECT
#RecipeCount AS RecipeCount
SELECT COUNT(Id) as FilterCount FROM #RESULT_SET_TABLE
WHERE
((#FILTERBY = 1 AND SubmittedById= #USERID)
OR (#FILTERBY = 2 AND (SubmittedById <> #USERID OR SubmittedById IS NULL))
OR (#FILTERBY <> 1 AND #FILTERBY <> 2))
END

You need to look at the execution plan to see where the time is going. It could be indexes, table-scans caused by your UDF, any number of things. As you anayze the plan, try to break up the query into smaller pieces to see if you can make a difference in them.
Then learn about ROW_NUMBER to see if you can do without the local table.

Couple notes
Indexing - often times when people create procedures which use temp table or table variable they fail to realize you can create indexes on those objects and this can have massive performance implications.
UDF - Sometimes the query processor will effectively inline UDF logic and sometimes not, look closely at your query plan an see how this is being handled. Often times if you manually inline this logic in something like a correlated sub-query you can boost performance a lot.

As others have said, the only way to know is to look at explain plans. Glancing over the code, this part looks kind of fishy:
AND RecipeReviews.RecipeRatingId= (
SELECT MAX(RecipeReviews.RecipeRatingId)
FROM RecipeReviews
WHERE RecipeReviews.ReviewedById=#UserId
AND RecipeReviews.RecipeId=FavoriteRecipes.RecipeId
)
In general, doing non-trivial stuff in join conditions is a Bad Idea. I would factor that out into a sub-select, and since it's an outer join, you'd probably have to combine that with RecipeReviews somehow.
BUT: All of this is speculation! Explain! Measure!

Well in addition to the possible poor performance of the UDF, this line of code concerns me
LEFT OUTER JOIN RecipeReviews
ON RecipeReviews.RecipeId=Recipes.RecipeId
AND RecipeReviews.ReviewedById=#UserId
AND RecipeReviews.RecipeRatingId=
(SELECT MAX(RecipeReviews.RecipeRatingId)
FROM RecipeReviews
WHERE RecipeReviews.ReviewedById=#UserId
AND RecipeReviews.RecipeId=FavoriteRecipes.RecipeId )
OR RecipeReviews.RecipeRatingId IS NULL
It is generally a poor practice to use a subquery as part of a join. I would strongly supect this is not using any indexes you may have. And the OR part doesn;t make sense to mea atll all, the left join shoudl get you this.
Rewrite it to make a derived table instead.
If you have a lot of records a temp table usually performs better than a table variable and can (and probably should) be indexed.

You need to add parentheses around your OR conditions.
LEFT OUTER JOIN RecipeReviews
ON RecipeReviews.RecipeId = Recipes.RecipeId
AND RecipeReviews.ReviewedById = #UserId
AND
-- insert open parenthesis here:
(
RecipeReviews.RecipeRatingId = (... subquery ...)
OR RecipeReviews.RecipeRatingId IS NULL
-- insert close parenthesis here:
)

the very first, simple thing i would do, is move all your declare statements to the top.
DECLARE #ROW_START INT,
#ROW_END INT,
#RecipeCount INT
DECLARE
#RESULT_SET_TABLE
TABLE
(
Id INT NOT NULL IDENTITY(1,1),
)
The next part, which is still rather simple, is stuff like this:
AND Recipes.RecipeStateId <> 3
AND RecipeTweaks.TweakedById= #UserId
This can be taken out of the join and move to the where clause. if you can, change the <> to an in statement so that it can utlize an index seek.
AND RecipeReviews.RecipeRatingId=
(
SELECT MAX(RecipeReviews.RecipeRatingId)
FROM RecipeReviews
WHERE RecipeReviews.ReviewedById=#UserId
AND RecipeReviews.RecipeId=FavoriteRecipes.RecipeId
)
that's jsut crazy looking and needs to be completely redone.

How can I efficiently do a database massive update?

I have a table with some duplicate entries. I have to discard all but one, and then update this latest one. I've tried with a temporary table and a while statement, in this way:
CREATE TABLE #tmp_ImportedData_GenericData
(
Id int identity(1,1),
tmpCode varchar(255) NULL,
tmpAlpha3Code varchar(50) NULL,
tmpRelatedYear int NOT NULL,
tmpPreviousValue varchar(255) NULL,
tmpGrowthRate varchar(255) NULL
)
INSERT INTO #tmp_ImportedData_GenericData
SELECT
MCS_ImportedData_GenericData.Code,
MCS_ImportedData_GenericData.Alpha3Code,
MCS_ImportedData_GenericData.RelatedYear,
MCS_ImportedData_GenericData.PreviousValue,
MCS_ImportedData_GenericData.GrowthRate
FROM MCS_ImportedData_GenericData
INNER JOIN
(
SELECT CODE, ALPHA3CODE, RELATEDYEAR, COUNT(*) AS NUMROWS
FROM MCS_ImportedData_GenericData AS M
GROUP BY M.CODE, M.ALPHA3CODE, M.RELATEDYEAR
HAVING count(*) > 1
) AS M2 ON MCS_ImportedData_GenericData.CODE = M2.CODE
AND MCS_ImportedData_GenericData.ALPHA3CODE = M2.ALPHA3CODE
AND MCS_ImportedData_GenericData.RELATEDYEAR = M2.RELATEDYEAR
WHERE
(MCS_ImportedData_GenericData.PreviousValue <> 'INDEFINITO')
-- SELECT * from #tmp_ImportedData_GenericData
-- DROP TABLE #tmp_ImportedData_GenericData
DECLARE #counter int
DECLARE #rowsCount int
SET #counter = 1
SELECT #rowsCount = count(*) from #tmp_ImportedData_GenericData
-- PRINT #rowsCount
WHILE #counter < #rowsCount
BEGIN
SELECT
#Code = tmpCode,
#Alpha3Code = tmpAlpha3Code,
#RelatedYear = tmpRelatedYear,
#OldValue = tmpPreviousValue,
#GrowthRate = tmpGrowthRate
FROM
#tmp_ImportedData_GenericData
WHERE
Id = #counter
DELETE FROM MCS_ImportedData_GenericData
WHERE
Code = #Code
AND Alpha3Code = #Alpha3Code
AND RelatedYear = #RelatedYear
AND PreviousValue <> 'INDEFINITO' OR PreviousValue IS NULL
UPDATE
MCS_ImportedData_GenericData
SET
PreviousValue = #OldValue, GrowthRate = #GrowthRate
WHERE
Code = #Code
AND Alpha3Code = #Alpha3Code
AND RelatedYear = #RelatedYear
AND MCS_ImportedData_GenericData.PreviousValue ='INDEFINITO'
SET #counter = #counter + 1
END
but it takes too long time, even if there are just 20000 - 30000 rows to process.
Does anyone has some suggestions in order to improve performance?
Thanks in advance!

WITH q AS (
SELECT m.*, ROW_NUMBER() OVER (PARTITION BY CODE, ALPHA3CODE, RELATEDYEAR ORDER BY CASE WHEN PreviousValue = 'INDEFINITO' THEN 1 ELSE 0 END)
FROM MCS_ImportedData_GenericData m
WHERE PreviousValue <> 'INDEFINITO'
)
DELETE
FROM q
WHERE rn > 1

Quassnoi's answer uses SQL Server 2005+ syntax, so I thought I'd put in my tuppence worth using something more generic...
First, to delete all the duplicates, but not the "original", you need a way of differentiating the duplicate records from each other. (The ROW_NUMBER() part of Quassnoi's answer)
It would appear that in your case the source data has no identity column (you create one in the temp table). If that is the case, there are two choices that come to my mind:
1. Add the identity column to the data, then remove the duplicates
2. Create a "de-duped" set of data, delete everything from the original, and insert the de-deduped data back into the original
Option 1 could be something like...
(With the newly created ID field)
DELETE
[data]
FROM
MCS_ImportedData_GenericData AS [data]
WHERE
id > (
SELECT
MIN(id)
FROM
MCS_ImportedData_GenericData
WHERE
CODE = [data].CODE
AND ALPHA3CODE = [data].ALPHA3CODE
AND RELATEDYEAR = [data].RELATEDYEAR
)
OR...
DELETE
[data]
FROM
MCS_ImportedData_GenericData AS [data]
INNER JOIN
(
SELECT
MIN(id) AS [id],
CODE,
ALPHA3CODE,
RELATEDYEAR
FROM
MCS_ImportedData_GenericData
GROUP BY
CODE,
ALPHA3CODE,
RELATEDYEAR
)
AS [original]
ON [original].CODE = [data].CODE
AND [original].ALPHA3CODE = [data].ALPHA3CODE
AND [original].RELATEDYEAR = [data].RELATEDYEAR
AND [original].id <> [data].id

I don't understand used syntax perfectly enough to post an exact answer, but here's an approach.
Identify rows you want to preserve (eg. select value, ... from .. where ...)
Do the update logic while identifying (eg. select value + 1 ... from ... where ...)
Do insert select to a new table.
Drop the original, rename new to original, recreate all grants/synonyms/triggers/indexes/FKs/... (or truncate the original and insert select from the new)
Obviously this has a prety big overhead, but if you want to update/clear millions of rows, it will be the fastest way.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

how do I remove nested select statement - sql

Related

IF EXISTS in WHERE clause

SQL select count of all rows when using IF ELSE

Modify FindAll function to a DoesExist function in SQL Server

Stored procedure optimization

How can I efficiently do a database massive update?

Categories

Resources