Do I need a where clause in a conditional UPDATE? - sql

We imported a lot of data from another table. Now I'm trying to correct some of them.
UPDATE [x10ddata].[dbo].[ResourceTest]
SET [Country] = (CASE
WHEN [Country] IN ('Aezerbaijan', 'AZERBIJAN') THEN 'Azerbaijan'
WHEN [Country] = 'Belgique' THEN 'Belgium'
WHEN [Country] = 'China (RPC)' THEN 'China'
WHEN [Country] = 'Columbia' THEN 'Colombia'
WHEN [Country] = 'Croatia (Local Name: Hrvatska)' THEN 'Croatia'
.....//...
WHEN [Country] IN ('U.S.', 'U.S.A', 'U.S.A.', 'US', 'USA',
'USA - Maryland', 'USAQ') THEN 'United States'
END)
GO
I didn't use ELSE because many rows have valid country. My question is to know whether I need to the WHERE clause to filter the rows that will be affected?
The reason I'm asking this question is that, I've selected into a test table and tried the the script. According to the output, all the rows affected, but when I check closely, not all the rows were affected. It's confusing.
Thanks for helping

The case statement will return null if none of the when clauses are met. You can verify this with this simple sql:
declare #i int
set #i = 2
select case when #i = 1 then 'A' end AS Column1
This will return null since #i is not 1.
To fix this in your case, you can either add the where clause like you said, or the simpler option might be to add ELSE [Country] after all of your WHEN clauses. This would mean "If I don't need to change the country field, then just use the same value that was there before."

You won't need a WHERE clause, but the ELSE clause is needed. Change your statement to:
UPDATE [x10ddata].[dbo].[ResourceTest]
SET [Country] = (CASE
WHEN [Country] IN ('Aezerbaijan', 'AZERBIJAN') THEN 'Azerbaijan'
WHEN [Country] = 'Belgique' THEN 'Belgium'
WHEN [Country] = 'China (RPC)' THEN 'China'
WHEN [Country] = 'Columbia' THEN 'Colombia'
WHEN [Country] = 'Croatia (Local Name: Hrvatska)' THEN 'Croatia'
.....//...
WHEN [Country] IN ('U.S.', 'U.S.A', 'U.S.A.', 'US', 'USA',
'USA - Maryland', 'USAQ') THEN 'United States'
ELSE [Country]
END)

alternatively,
Make a conversion table,
DECLARE #conversion TABLE
(
[Before] NVARCHAR(250) NOT NULL,
[After] NVARCHAR(250) NOT NULL
);
INSERT #conversion
VALUES
('Aezerbaijan', 'Azerbaijan'),
...
('USAQ', 'United States');
Then do,
UPDATE [x10ddata].[dbo].[ResourceTest]
SET [Country] = [C].[After]
FROM
[x10ddata].[dbo].[ResourceTest]
JOIN
#conversion [C]
ON [C].[Before] = [C].[Country];
This has a number of potential performance benefits over the extend CASE approach, among which is only effecting rows that need to change.
Its probably worth using a temporary table instead of a table variable and creating an index on [Before] to optimize the join.

No you don't need a where clause because your CASE statements contain your logic.
Note: If a value doesn't match any of your CASE statements then it will return null. So I recommend adding ELSE [Country] at the end. Here's an example that demonstrates what I'm saying
SELECT * INTO #yourTable
FROM
(
SELECT 1 ID, CAST('OldValue' AS VARCHAR(25)) val
UNION ALL
SELECT 2 , 'OldValue'
UNION ALL
SELECT 3,'Doesnt need to be updated'
) A
SELECT *
FROM #yourTable;
Results:
ID val
----------- -------------------------
1 OldValue
2 OldValue
3 Doesnt need to be updated
Now update:
UPDATE #yourTable
SET val =
CASE
WHEN ID = 1 THEN 'NewValue1'
WHEN ID = 2 THEN 'NewValue2'
--Add this so you leave values alone if they don't match your case statements
ELSE val
END
FROM #yourTable
SELECT *
FROM #yourTable
Results:
ID val
----------- -------------------------
1 NewValue1
2 NewValue2
3 Doesnt need to be updated

No, you don't NEED it. Aside from the performance cost that may be incurred through additional (unnecessary) writes to disk and locking (blocking other sessions), the physical outcome would be the same.
One could argue that you SHOULD use a WHERE clause, not only for performance reasons, but to better capture and convey intentions.

Related

Dynamic Select Query based on Query Column | SQL Server 2012

May I ask on how do we execute Dynamic Select Query?
Basically what I want to achieve is a dynamic select that can SELECT a query based on a dynamic column, and that dynamic column is existing in a table.
Example Table
Table Name: AppleBox
Table Columns: Apple101, Apple102, Apple103
Table Row:
Apple101 = 1,2,3,4,5
Apple102 = 1,2,
Apple103 = 1
Supposed that I would run a query based on the example
SELECT apple+'$applecode' FROM AppleBox
with $applecode being from an external source, and $applecode = 101, and my expected query would be.
SELECT apple101 FROM AppleBox
Is there a simple way to do this?
Please check below code.
CREATE TABLE [dbo].[AppleBox](
[Apple101] [varchar](50) NULL,
[Apple102] [varchar](50) NULL,
[Apple103] [varchar](50) NULL
) ON [PRIMARY]
GO
INSERT [dbo].[AppleBox] ([Apple101], [Apple102], [Apple103]) VALUES (N'1,2', N'1', N'1')
INSERT [dbo].[AppleBox] ([Apple101], [Apple102], [Apple103]) VALUES (N'3,4,5', N'1,2,', N'2')
INSERT [dbo].[AppleBox] ([Apple101], [Apple102], [Apple103]) VALUES (N'1,2,3,4,', N'1,2,3', N'3')
INSERT [dbo].[AppleBox] ([Apple101], [Apple102], [Apple103]) VALUES (N'1,2,3,4,5', N'1', N'4')
GO
DECLARE #query NVARCHAR(500)
DECLARE #exparam NVARCHAR(50)
SET #exparam='101'
SET #query='SELECT Apple'+#exparam+' FROM dbo.AppleBox'
EXECUTE sp_executesql #query
You can use a case expression:
select (case when $applecode = 101 then apple101
when $applecode = 102 then apple102
when $applecode = 103 then apple103
end) as apple
from t;
You would only need dynamic SQL (in this case) if your query could return a variable number of columns or if you wanted to set the name of the column. Neither seems important.

SQL Server pivot values with different data types

i am trying to pivot all values in different type in MSSQL 2016. I could not find a way how i can pivot different data types..
The first table are initial form / structure. The second table is the desired shape.
I was trying the following SQL code to pivot my values
SELECT
[id] AS [id],
FIRSTNAME,
LASTNAME,
BIRTHDATE,
ADDRESS,
FLAG,
NUMBER
FROM (
SELECT
[cm].[key] AS [id],
[cm].[column] AS [column],
[cm].[string] AS [string],
[cm].[bit] AS [bit],
[cm].[xml] AS [xml],
[cm].[number] AS [number],
[cm].[date] AS [date]
FROM [cmaster] AS [cm]
) AS [t]
PIVOT (
MAX([string]) --!?!?
FOR [column] IN (
FIRSTNAME,
LASTNAME,
BIRTHDATE,
ADDRESS,
FLAG,
NUMBER
)
) AS [p]
I think your best bet is to use conditional aggregation, e.g.
SELECT cm.id,
FIRSTNAME = MAX(CASE WHEN cm.[property] = 'firstname' THEN cm.[string] END),
LASTNAME = MAX(CASE WHEN cm.[property] = 'lastname' THEN cm.[string] END),
BIRTHDATE = MAX(CASE WHEN cm.[property] = 'birthddate' THEN cm.[date] END),
FLAG = CONVERT(BIT, MAX(CASE WHEN cm.[bit] = 'flag' THEN CONVERT(TINYINT, cm.[boolean]) END)),
NUMBER = MAX(CASE WHEN cm.[property] = 'number' THEN cm.[integer] END)
FROM cmaster AS cm
GROUP BY cm.id;
Although, as you can see, your query becomes very tightly coupled to your EAV model, and why EAV is considered an SQL antipattern. Your alternative is to create a single column in your subquery and pivot on that, but you have to convert to a single data type, and lose a bit of type safety:
SELECT id, FIRSTNAME, LASTNAME, BIRTHDATE, ADDRESS, FLAG, NUMBER
FROM (
SELECT id = cm.[key],
[column] = cm.[column],
Value = CASE cm.type
WHEN 'NVARCHAR' THEN cm.string
WHEN 'DATETIME' THEN CONVERT(NVARCHAR(MAX), cm.date, 112)
WHEN 'XML' THEN CONVERT(NVARCHAR(MAX), cm.xml)
WHEN 'BIT' THEN CONVERT(NVARCHAR(MAX), cm.boolean)
WHEN 'INT' THEN CONVERT(NVARCHAR(MAX), cm.integer)
END
FROM cmaster AS cm
) AS t
PIVOT
(
MAX(Value)
FOR [column] IN (FIRSTNAME, LASTNAME, BIRTHDATE, ADDRESS, FLAG, NUMBER)
) AS p;
In order to make the result as per your request, first thing is we need to bring the data in to one format which is compatible with all data types. VARCHAR is ideal for that. Then prepare the base table using a simple select query, then PIVOT the result.
In the last projection, if you want, you can convert the data back in to the original format.
This query can be written dynamically as well to obtain the result as records are added. Here I provide the static answer according to your data. If you need a more generic dynamic answer, let me know. So I can post here.
--data insert scripts I used:
CREATE TABLE First_Table
(
[id] int,
[column] VARCHAR(10),
[string] VARCHAR(20),
[bit] BIT,
[xml] [xml],
[number] INT,
[date] DATE
)
SELECT GETDATE()
INSERT INTO First_Table VALUES(1, 'FIRST NAME', 'JOHN' , NULL, NULL, NULL, NULL)
INSERT INTO First_Table VALUES(1, 'LAST NAME', 'DOE' , NULL, NULL, NULL, NULL)
INSERT INTO First_Table VALUES(1, 'BIRTH DATE', NULL , NULL, NULL, NULL, '1985-02-25')
INSERT INTO First_Table VALUES(1, 'ADDRESS', NULL , NULL, 'SDFJDGJOKGDGKPDGKPDKGPDKGGKGKG', NULL, NULL)
INSERT INTO First_Table VALUES(1, 'FLAG', NULL , 1, NULL, NULL, NULL)
INSERT INTO First_Table VALUES(1, 'NUMBER', NULL , NULL, NULL, 20, NULL)
SELECT
PIVOTED.* FROM
(
--MAKING THE BASE TABLE FOR PIVOT
SELECT
[id]
,[column] AS [COLUMN]
, CASE WHEN [column] = 'FIRST NAME' then [string]
WHEN [column] = 'LAST NAME' then [string]
WHEN [column] = 'BIRTH DATE' then CAST([date] AS VARCHAR(100))
WHEN [column] = 'ADDRESS' then CAst([xml] as VARCHAR(100))
WHEN [column] = 'FLAG' then CAST([bit] AS VARCHAR(100))
else CAST([number] AS VARCHAR(100)) END AS [VALUE]
FROM First_Table
) AS [P]
PIVOT
(
MIN ([P].[VALUE])
FOR [column] in ([FIRST NAME],[LAST NAME],[BIRTH DATE],[ADDRESS],[FLAG],[NUMBER])
) AS PIVOTED
RESULT:
SQL:
SELECT
            ID,
            FIRSTNAME,
            ...,
            FLAG = CAST (FLAG AS INT),
            ...
FROM
            (
            SELECT
                        *
            FROM
                        (
                        SELECT
                                    f.ID,
                                    f.PROPERTY,
                                    f.STRING + f."INTEGER" + f.DATETIME + f.BOLLEAN + f.XML AS COLS
                        FROM
                                    FIRSTTBL f)
            PIVOT(
                        min(COLS) FOR PROPERTY IN
                                    (
                                    'firstname' AS firstname,
                                    'lastname' AS lastname,
                                    'birthdate' AS birthdate,
                                    'address' AS address,
                                    'flag' AS flag,
                                    'number' AS "NUMBER"
                                    )
                        )
            )
According to the original table, there is one and only one non-null value among STRING, INTEGER, DATETIME, BOLLEAN and XML columns for any row, so we just need to get the first non-null value and assign it to the corresponding new column. It is not difficult to perform the transposition using PIVOT function, except that we need to handle different data types according to the SQL rule, which requires that each column have a consistent type. For this task, first we need to convert the combined column values into a string, perform row-to-column transposition, and then convert string back to the proper types. When there are a lot of columns, the SQL statement can be tricky, and dynamic requirements are even hard to achieve.
Yet it is easy to write the code using the open-source esProc SPL:
 
A
1
=connect("MSSQL")
2
=A1.query#x("SELECT * FROM FIRSTTBL")
3
=A2.pivot(ID;PROPERTY,~.array().m(4:).ifn();"firstname":"FIRSTNAME", "lastname":"LASTANME","birthdate":"BIRTHDAY","address":"ADDRESS","flag":"FLAG","number":"NUMBER")
SPL does not require that data in the same column have consistent type. It is easy for it to maintain the original data types while performing the transposition.

performance impact of making where clause dummy - SQL Server

I need to know about the performance impact of the below method of writing the query.
Assume there is an employee table. Requirement is to get a list of employees under a particular department and optionally the user can filter the result set by providing the city/location.
declare #dept varchar(10) = 'ABC', #city varchar(10)
select * from employee where dept = #dept and city = isnull(#city, city)
Is this fine? or do we need to use traditional if logic to check whether the user provided city as input?
Thanks,
Sabarish.
I remember reading somewhere that the following syntax is quicker than calling ISNULL():
select * from employee where dept = #dept and (#city IS NULL OR #city = city)
It was something to do with the SQL compiler effectively knowing that it can ignore the expression in brackets if #city is null.
Sorry but no idea where I read this (it was some time ago), otherwise I would cite it properly.
Most powerfull aproach to solve performance problems with nulls is try to avoid nulls by default values. In your case should be good try something like:
declare #dept varchar(10) = 'ABC', #city varchar(10) = 'unknown'
SELECT *
FROM employee
WHERE dept = #dept AND
#city = 'unknown'
UNION
SELECT *
FROM employee
WHERE dept = #dept AND
city = #city AND
#city != 'unknown'
Why?
Cardinality estimator is not able to estimate correct number of rows that query returns and it causes, that execution plan should be bad for this particular query. Avoid nulls and everything will be great B-)
For sure the answer provided by #Jonathan will improve performance if 'City' column has separate NonClustered Index on it. If not both the execution plan will lead to SCAN. If you have NonClustered Index then the Jonathan's approach will do SEEK instead of SCAN which will be good in terms of performance.
Let me try to explain why that is the scenario with a sample as in below table: For ease of use I did not considered two predicates dept and city instead I am considering only City.
Consider below Employee table:
CREATE TABLE [dbo].[Employee](
[EmployeeId] [int] NULL,
[EmployeeName] [varchar](20) NULL,
[Dept] [varchar](15) NULL,
[city] [varchar](15) NULL
) ON [PRIMARY]
GO
--Creating Clustered Index on Id
CREATE CLUSTERED INDEX [CI_Employee_EmployeeId] ON [dbo].[Employee] ( [EmployeeId] ASC)
--Loading Data
Loading Sample data
Insert into Employee
Select top (10000) EmployeeId = Row_Number() over (order by (Select NULL))
,EmployeeName = Concat ('Name ',Row_Number() over (order by (Select NULL)))
,Dept = Concat ('Dept ',(Row_Number() over (order by (Select NULL))) % 50)
,City = Concat ('City ',Row_Number() over (order by (Select NULL)))
from master..spt_values s1, master..spt_values s2
Now Executing simple query with normal predicate:
Declare #city varchar(15) = 'City 1500'
Select * from Employee where city = #city
--It Does Clustered Index Scan
Now creating an non-clustered Index on city
--Now adding Index on City
Create NonClustered Index NCI_Employee_City on dbo.Employee (city)
Declare #city varchar(15) = 'City 1500'
Select * from Employee where city = #city
--It Does Index Seek
Now coming to your isnull function
Since it forces function on each city it uses SCAN as below
Declare #city varchar(15) = 'City 1500'
Select * from Employee where city = isnull(#city, City)
go
Declare #city varchar(15) = 'City 1500'
Select * from Employee where city is null or city = #city
If you look at the overall percentage it takes more for IsNull function.
So if you have an Index all these will be helpfull else it is going to be scan anyway.

Replacing Postcodes with numbers if in

I am looking to have an sql query in one statement which pulls through a postcode, if its within that postcode to then replace that one with the wording test.
For example:
SELECT [Postcode]
FROM [TABLE]
WHERE user = 'user'
AND [Postcode] IN ('000 000')
SELECT REPLACE
('{[postcode]}'
,'000 000','test data')
I am thinking about using nested SQL. Is there anyway I can join the two above statements into one?
I think you are looking for this:
SELECT --[Postcode],
case when [PostCode] = '000 000' then REPLACE([Postcode],'000 000','test data') else [PostCode] end as [Postcode]
FROM [TABLE]
WHERE user = 'user'
--AND [Postcode] IN ('000 000')

Add empty row to query results if no results found

I'm writing stored procs that are being called by a legacy system. One of the constraints of the legacy system is that there must be at least one row in the single result set returned from the stored proc. The standard is to return a zero in the first column (yes, I know!).
The obvious way to achieve this is create a temp table, put the results into it, test for any rows in the temp table and either return the results from the temp table or the single empty result.
Another way might be to do an EXISTS against the same where clause that's in the main query before the main query is executed.
Neither of these are very satisfying. Can anyone think of a better way. I was thinking down the lines of a UNION kind of like this (I'm aware this doesn't work):
--create table #test
--(
-- id int identity,
-- category varchar(10)
--)
--go
--insert #test values ('A')
--insert #test values ('B')
--insert #test values ('C')
declare #category varchar(10)
set #category = 'D'
select
id, category
from #test
where category = #category
union
select
0, ''
from #test
where ##rowcount = 0
Very few options I'm afraid.
You always have to touch the table twice, whether COUNT, EXISTS before, EXISTs in UNION, TOP clause etc
select
id, category
from mytable
where category = #category
union all --edit, of course it's quicker
select
0, ''
where NOT EXISTS (SELECT * FROM mytable where category = #category)
An EXISTS solution is better then COUNT because it will stop when it finds a row. COUNT will traverse all rows to actually count them
It's an old question, but i had the same problem.
Solution is really simple WITHOUT double select:
select top(1) WITH TIES * FROM (
select
id, category, 1 as orderdummy
from #test
where category = #category
union select 0, '', 2) ORDER BY orderdummy
by the "WITH TIES" you get ALL rows (all have a 1 as "orderdummy", so all are ties), or if there is no result, you get your defaultrow.
You can use a full outer join. Something to the effect of ...
declare #category varchar(10)
set #category = 'D'
select #test.id, ISNULL(#test.category, #category) as category from (
select
id, category
from #test
where category = #category
)
FULL OUTER JOIN (Select #category as CategoryHelper ) as EmptyHelper on 1=1
Currently performance testing this scenario myself so not sure on what kind of impact this would have but it will give you a blank row with Category populated.
This is #swe's answer, just reformatted:
CREATE FUNCTION [mail].[f_GetRecipients]
(
#MailContentCode VARCHAR(50)
)
RETURNS TABLE
AS
RETURN
(
SELECT TOP 1 WITH TIES -- Returns either all Priority 1 rows or, if none exist, all Priority 2 rows
[To],
CC,
BCC
FROM (
SELECT
[To],
CC,
BCC,
1 AS Priority
FROM mail.Recipients
WHERE 1 = 1
AND IsActive = 1
AND MailContentCode = #MailContentCode
UNION ALL
SELECT
*,
2 AS Priority
FROM (VALUES
(N'system#company.com', NULL, NULL),
(N'author#company.com', NULL, NULL)
) defaults([To], CC, BCC)
) emails
ORDER BY Priority
)
I guess you could try:
Declare #count int
set #count = 0
Begin
Select #count = Count([Column])
From //Your query
if(#Count = 0)
select 0
else //run your query
The downside is that you're effectively running your query twice, the up side is that you're skiping the temp table.
To avoid duplicating the selecting query, how about a temp table to store the query result first? And based on the temp table, return default row if the temp table is empty or return the temp when it has result?