Conditional update across multiple fields - sql

New to SQL so still learning all there is to offer.
I'm bringing in data from multiple sources and building a unique identifier table.
Several fields must be populated in order of precedence (i.e. the first datasource is preferred, then 2nd and so on).
Here is what I'm trying to do
UPDATE
TABLE1 AS DIMTABLE
SET
FIRSTNAME = ifnull( FIRSTNAME, RAWTABLE.FIRSTNAME )
, DIMTABLE.MIDDLENAME = ifnull( DIMTABLE.MIDDLENAME, RAWTABLE.MIDDLENAME )
, DIMTABLE.LASTNAME = ifnull( DIMTABLE.LASTNAME, RAWTABLE.LASTNAME )
, DIMTABLE.GENDER = IFNULL( DIMTABLE.GENDER, RAWTABLE.GENDER )
, DIMTABLE.DOB = IFNULL( DIMTABLE.DOB, RAWTABLE.DOB )
, DIMTABLE.PHONE1 = IFNULL( DIMTABLE.PHONE1, RAWTABLE.PHONE1 )
, DIMTABLE.PHONE2 = IFNULL( DIMTABLE.PHONE2, RAWTABLE.PHONE2 )
, DIMTABLE.EMAIL = IFNULL( DIMTABLE.EMAIL, RAWTABLE.EMAIL )
, DIMTABLE.FAX = IFNULL( DIMTABLE.FAX, RAWTABLE.FAX )
FROM
TABLE2 AS RAWTABLE
WHERE
RAWTABLE.ID_SOURCE_ID = 9
AND DIMTABLE.UID = RAWTABLE.UID
I have a sequence of these statements. One for each RAWTABLE.IDSOURCE_ID = 10,11,15...
The result is that the null fields I'm trying to update, remain null.
I was hoping something like this was possible to avoid multiple passes over the table.
I'm struggling with this approach which usually means there must be a better way

One option is to prepare rows first(giving preference first NOT NULL value in a column per UID):
CREATE OR REPLACE TEMPORARY TABLE RAWTABLE_SINGLE_UID
AS
SELECT DISTINCT
UID
,FIRST_VALUE(FIRSTNAME) IGNORE_NULLS
OVER(PARTITION BY UID ORDER BY SOURCE_ID) AS FIRSTNAME
,FIRST_VALUE(LASTNAME) IGNORE_NULLS
OVER(PARTITION BY UID ORDER BY SOURCE_ID) AS LASTNAME
,...
FROM TABLE2
WHERE SOURCE_ID IN (9,10,11,15);
Warning QUALIFY does not guarantee first non-null value per column but entire row:
CREATE OR REPLACE TEMPORARY TABLE RAWTABLE_SINGLE_UID
AS
SELECT *
FROM TABLE2
WHERE SOURCE_ID IN (9,10,11,15)
QUALIFY ROW_NUMBER() OVER(PARTITION BY UID ORDER BY SOURCE_ID) = 1
And then perform update:
UPDATE TABLE1 AS DIMTABLE
SET FIRSTNAME = ifnull( FIRSTNAME, RAWTABLE.FIRSTNAME )
, DIMTABLE.MIDDLENAME = ifnull( DIMTABLE.MIDDLENAME, RAWTABLE.MIDDLENAME )
, DIMTABLE.LASTNAME = ifnull( DIMTABLE.LASTNAME, RAWTABLE.LASTNAME )
, DIMTABLE.GENDER = IFNULL( DIMTABLE.GENDER, RAWTABLE.GENDER )
, DIMTABLE.DOB = IFNULL( DIMTABLE.DOB, RAWTABLE.DOB )
, DIMTABLE.PHONE1 = IFNULL( DIMTABLE.PHONE1, RAWTABLE.PHONE1 )
, DIMTABLE.PHONE2 = IFNULL( DIMTABLE.PHONE2, RAWTABLE.PHONE2 )
, DIMTABLE.EMAIL = IFNULL( DIMTABLE.EMAIL, RAWTABLE.EMAIL )
, DIMTABLE.FAX = IFNULL( DIMTABLE.FAX, RAWTABLE.FAX )
FROM RAWTABLE_SINGLE_UID AS RAWTABLE
WHERE DIMTABLE.UID = RAWTABLE.UID;

If I understand correctly, you can use LEFT JOIN for this purpose with `COALESCE:
UPDATE TABLE1 main
SET FIRSTNAME = COALESCE(T1.FIRSTNAME, t2.FIRSTNAME, t3.FIRSTNAME),
. . ..
FROM TABLE1 T1 LEFT JOIN
TABLE2 T2
ON T2.UID = T1.UID LEFT JOIN
TABLE3 t3
ON t3.UID = T1.UID
WHERE T1.ID_SOURCE_ID = 9 AND
T1.UID = MAIN.UID;
This is assuming the following:
You want to update ID_SOURCE_ID = 9 in the main table.
UID is a unique id in all the tables.
The secondary tables do not need to have all the UIDs.

Related

Remove multiple rows with same ID

So I've done some looking around and wasn't unable to find quite what I was looking for. I have two tables.
1.) Table where general user information is stored
2.) Where a status is generated and stored.
The problem is, is that there are multiple rows for the same users and querying these results in multiple returns. I can't just merge them because they aren't all the same status. I need just the newest status from that table.
Example of the table:
SELECT DISTINCT
TOP(50) cam.UserID AS PatientID,
mppi.DisplayName AS Surgeon,
ISNULL(sci.IOPStatus, 'N/A') AS Status,
tkstat.TrackerStatusID AS Stat_2
FROM
Main AS cam
INNER JOIN
Providers AS rap
ON cam.VisitID = rap.VisitID
INNER JOIN
ProviderInfo AS mppi
ON rap.UnvUserID = mppi.UnvUserID
LEFT OUTER JOIN
Inop AS sci
ON cam.CwsID = sci.CwsID
LEFT OUTER JOIN
TrackerStatus AS tkstat
ON cam.CwsID = tkstat.CwsID
WHERE
(
cam.Location_ID IN
(
'SURG'
)
)
AND
(
rap.IsAttending = 'Y'
)
AND
(
cam.DateTime BETWEEN CONCAT(CAST(GETDATE() AS DATE), ' 00:00:00') AND CONCAT(CAST(GETDATE() AS DATE), ' 23:59:59')
)
AND
(
cam.Status_StatusID != 'Cancelled'
)
ORDER BY
cam.UserID ASC
So I need to grab only the newest Stat_2 from each ID so they aren't returning multiple rows. Each Stat_2 also has an update time meaning I can sort by the time/date that column is : StatusDateTime
One way to handle this is to create a calculated row_number for the table where you need the newest record.
Easiest way to do that is to change your TKSTAT join to a derived table with the row_number calculation and then add a constraint to your join where the RN =1
SELECT DISTINCT TOP (50)
cam.UserID AS PatientID, mppi.DisplayName AS Surgeon, ISNULL(sci.IOPStatus, 'N/A') AS Status, tkstat.TrackerStatusID AS Stat_2
FROM Main AS cam
INNER JOIN Providers AS rap ON cam.VisitID = rap.VisitID
INNER JOIN ProviderInfo AS mppi ON rap.UnvUserID = mppi.UnvUserID
LEFT OUTER JOIN Inop AS sci ON cam.CwsID = sci.CwsID
LEFT OUTER JOIN (SELECT tk.CwsID, tk.TrackerStatusId, ROW_NUMBER() OVER (PARTITION BY tk.cwsId ORDER BY tk.CreationDate DESC) AS rn FROM TrackerStatus tk)AS tkstat ON cam.CwsID = tkstat.CwsID
AND tkstat.rn = 1
WHERE (cam.Location_ID IN ( 'SURG' )) AND (rap.IsAttending = 'Y')
AND (cam.DateTime BETWEEN CONCAT(CAST(GETDATE() AS DATE), ' 00:00:00') AND CONCAT(CAST(GETDATE() AS DATE), ' 23:59:59'))
AND (cam.Status_StatusID != 'Cancelled')
ORDER BY cam.UserID ASC;
Note you need a way to derive what the "newest" status is; I assume there is a created_date or something; you'll need to enter the correct colum name
ROW_NUMBER() OVER (PARTITION BY tk.cwsId ORDER BY tk.CreationDate DESC) AS rn
SQL Server doesn't offer a FIRST function, but you can reproduce the functionality with ROW_NUMBER() like this:
With Qry1 (
Select <other columns>,
ROW_NUMBER() OVER(
PARTITION BY <group by columns>
ORDER BY <time stamp column*> DESC
) As Seq
From <the rest of your select statement>
)
Select *
From Qry1
Where Seq = 1
* for the "newest" record.

How to get data where the whole column is NOT NULL?

I am trying to pull data where a specific column is completely not null. It should only return the data if ALL of the rows in the column meet that requirement. Doing simply IS NOT NULL will not work. In short, I am trying to find contracts where all of the products on that contract has been terminated and to only return that data.
Here is what I have so far, its barebone:
SELECT
T0.CustomerCode
, T0.CustomerName
, T1.ContractID
, T1.StartDate
, T1.TerminationDate
, T2.ProductRecordID
, T2.ProductSN
, T2.CompanySN
, T2.ProductRecordStatus
FROM
T0
INNER JOIN T1 ON T0.ContractID = T1.ContractID
INNER JOIN T2 ON T1.ProductRecordID = T2.ProductRecordID
WHERE T0.ProductRecordStatus = 'A'
A solution can be to count null value in the specified column and check if this number is equal to zero.
DECLARE #nullCount int = 1;
SELECT
#nullCount = COUNT(1)
FROM
[your_table]
WHERE
[your_column] IS NULL;
IF #nullCount = 0
SELECT
T0.CustomerCode , T0.CustomerName , T1.ContractID , T1.StartDate , T1.TerminationDate ,
T2.ProductRecordID , T2.ProductSN , T2.CompanySN , T2.ProductRecordStatus
FROM T0
INNER JOIN T1 ON T0.ContractID = T1.ContractID INNER JOIN T2 ON T1.ProductRecordID =
T2.ProductRecordID
WHERE T0.ProductRecordStatus = 'A';
in this way if you have one or more null values the query is not even performed.

Display fields from temp table in select

I have the following SQL. The problem is that it only displays results in the output for the the following : wp.gtid, wp.first_name etc I want to display the results from wpe.gtid, wpe.first_name etc too. So that i can see easily a comparison of the fields side by side.
with dups as (
select
wp.GtId
from CORE.WeccoParty wp
where exists (select 1
from CORE.WeccoParty wpe
-- where wp.Tin = wpe.Tin
where wp.FirstName = wpe.LastName
and wp.GtId <> wpe.GtId
)
)
select distinct
wp.GtId,
wp.CrmPartyId,
wp.LegalName,
wp.BusinessClass,
wp.RmFullName,
wp.PbeFullName,
wp.OverallClientStatus,
wp.OverallRpStatus,
wp.FirstName,
wp.LastName,
wp.Tin
from CORE.WeccoParty wp
join dups d on d.GtId = wp.GtId
order by 1,2
K1205, when I checked your code it seems to identify duplicate entries having the same FirstName but different GtId values
So maybe you can use Row_Number() function with partition by clause
Could you please check following SQL?
select
ROW_NUMBER() over (Partition By wp.FirstName Order By wp.GtId) as rn,
wp.GtId,
wp.CrmPartyId,
wp.LegalName,
wp.BusinessClass,
wp.RmFullName,
wp.PbeFullName,
wp.OverallClientStatus,
wp.OverallRpStatus,
wp.FirstName,
wp.LastName,
wp.Tin
from CORE.WeccoParty wp
order by wp.FirstName, rn
I hope following CTE query helps you for solution
The data (firstname and gtid) from duplicates are added as new columns after the first entry for the same firstname
with rawdata as (
select
ROW_NUMBER() over (Partition By wp.FirstName Order By wp.GtId) as rn,
wp.GtId,
wp.CrmPartyId,
wp.LegalName,
wp.BusinessClass,
wp.RmFullName,
wp.PbeFullName,
wp.OverallClientStatus,
wp.OverallRpStatus,
wp.FirstName,
wp.LastName,
wp.Tin
from WeccoParty wp
)
select t1.*, t2.GtId, t2.FirstName
from rawdata as t1
inner join rawdata as t2
on t1.FirstName = t2.FirstName and
t1.rn <> t2.rn
where t1.rn = 1
order by t1.FirstName, t2.rn

How to improve sql script performance

The following script is very slow when its run.
I have no idea how to improve the performance of the script.
Even with a view takes more than quite a lot minutes.
Any idea please share to me.
SELECT DISTINCT
( id )
FROM ( SELECT DISTINCT
ct.id AS id
FROM [Customer].[dbo].[Contact] ct
LEFT JOIN [Customer].[dbo].[Customer_ids] hnci ON ct.id = hnci.contact_id
WHERE hnci.customer_id IN (
SELECT DISTINCT
( [Customer_ID] )
FROM [Transactions].[dbo].[Transaction_Header]
WHERE actual_transaction_date > '20120218' )
UNION
SELECT DISTINCT
contact_id AS id
FROM [Customer].[dbo].[Restaurant_Attendance]
WHERE ( created > '2012-02-18 00:00:00.000'
OR modified > '2012-02-18 00:00:00.000'
)
AND ( [Fifth_Floor_London] = 1
OR [Fourth_Floor_Leeds] = 1
OR [Second_Floor_Bristol] = 1
)
UNION
SELECT DISTINCT
( ct.id )
FROM [Customer].[dbo].[Contact] ct
INNER JOIN [Customer].[dbo].[Wifinity_Devices] wfd ON ct.wifinity_uniqueID = wfd.[CustomerUniqueID]
AND startconnection > '2012-02-17'
UNION
SELECT DISTINCT
comdt.id AS id
FROM [Customer].[dbo].[Complete_dataset] comdt
LEFT JOIN [Customer].[dbo].[Aggregate_Spend_Counts] agsc ON comdt.id = agsc.contact_id
WHERE agsc.contact_id IS NULL
AND ( opt_out_Mail <> 1
OR opt_out_email <> 1
OR opt_out_SMS <> 1
OR opt_out_Mail IS NULL
OR opt_out_email IS NULL
OR opt_out_SMS IS NULL
)
AND ( address_1 IS NOT NULL
OR email IS NOT NULL
OR mobile IS NOT NULL
)
UNION
SELECT DISTINCT
( contact_id ) AS id
FROM [Customer].[dbo].[VIP_Card_Holders]
WHERE VIP_Card_number IS NOT NULL
) AS tbl
Wow, where to start...
--this distinct does nothing. Union is already distinct
--SELECT DISTINCT
-- ( id )
--FROM (
SELECT DISTINCT [Customer_ID] as ID
FROM [Transactions].[dbo].[Transaction_Header]
where actual_transaction_date > '20120218' )
UNION
SELECT
contact_id AS id
FROM [Customer].[dbo].[Restaurant_Attendance]
-- not sure that you are getting the date range you want. Should these be >=
-- if you want everything that occurred on the 18th or after you want >= '2012-02-18 00:00:00.000'
-- if you want everything that occurred on the 19th or after you want >= '2012-02-19 00:00:00.000'
-- the way you have it now, you will get everything on the 18th unless it happened exactly at midnight
WHERE ( created > '2012-02-18 00:00:00.000'
OR modified > '2012-02-18 00:00:00.000'
)
AND ( [Fifth_Floor_London] = 1
OR [Fourth_Floor_Leeds] = 1
OR [Second_Floor_Bristol] = 1
)
-- all of this does nothing because we already have every id in the contact table from the first query
-- UNION
-- SELECT
-- ( ct.id )
-- FROM [Customer].[dbo].[Contact] ct
-- INNER JOIN [Customer].[dbo].[Wifinity_Devices] wfd ON ct.wifinity_uniqueID = wfd.[CustomerUniqueID]
-- AND startconnection > '2012-02-17'
UNION
-- cleaned this up with isnull function and coalesce
SELECT
comdt.id AS id
FROM [Customer].[dbo].[Complete_dataset] comdt
LEFT JOIN [Customer].[dbo].[Aggregate_Spend_Counts] agsc ON comdt.id = agsc.contact_id
WHERE agsc.contact_id IS NULL
AND ( isnull(opt_out_Mail,0) <> 1
OR isnull(opt_out_email,0) <> 1
OR isnull(opt_out_SMS,0) <> 1
)
AND coalesce(address_1 , email, mobile) IS NOT NULL
UNION
SELECT
( contact_id ) AS id
FROM [Customer].[dbo].[VIP_Card_Holders]
WHERE VIP_Card_number IS NOT NULL
-- ) AS tbl
Where exists is generally faster than in as well.
Or conditions are generally slower as well, use more union statements instead.
And learn to use left joins correctly. If you have a where condition (other than where id is null) on the table on teh right side of a left join, it will convert to an inner join. If this is not what you want, then your code is currently giving you an incorrect result set.
See http://wiki.lessthandot.com/index.php/WHERE_conditions_on_a_LEFT_JOIN for an explanation of how to fix.
As stated in a comment optimize one at a time. See which one takes the longest and focus on that one.
union will remove duplicates so you don't need the distinct on the individual queries
On you first I would try this:
The left join is killed by the WHERE hnci.customer_id IN so you might as well have a join.
The sub-query is not efficient as cannot use an index on the IN.
The query optimizer does not know what in ( select .. ) will return so it cannot optimize use of indexes.
SELECT ct.id AS id
FROM [Customer].[dbo].[Contact] ct
JOIN [Customer].[dbo].[Customer_ids] hnci
ON ct.id = hnci.contact_id
JOIN [Transactions].[dbo].[Transaction_Header] th
on hnci.customer_id = th.[Customer_ID]
and th.actual_transaction_date > '20120218'
On that second join the query optimizer has the opportunity of which condition to apply first. Let say [Customer].[dbo].[Customer_ids].[customer_id] and [Transactions].[dbo].[Transaction_Header] each have indexes. The query optimizer has the option to apply that before [Transactions].[dbo].[Transaction_Header].[actual_transaction_date].
If [actual_transaction_date] is not indexed then for sure it would do the other ID join first.
With your in ( select ... ) the query optimizer has no option but to apply the actual_transaction_date > '20120218' first. OK some times query optimizer is smart enough to use an index inside the in outside the in but why make it hard for the query optimizer. I have found the query optimizer make better decisions if you make the decisions easier.
A join on a sub-query has the same problem. You take options away from the query optimizer. Give the query optimizer room to breathe.
try this, temptable should help you:
IF OBJECT_ID('Tempdb..#Temp1') IS NOT NULL
DROP TABLE #Temp1
--Low perfomance because of using "WHERE hnci.customer_id IN ( .... ) " - loop join must be
--and this "where" condition will apply to two tables after left join,
--so result will be same as with two inner joints but with bad perfomance
--SELECT DISTINCT
-- ct.id AS id
--INTO #temp1
--FROM [Customer].[dbo].[Contact] ct
-- LEFT JOIN [Customer].[dbo].[Customer_ids] hnci ON ct.id = hnci.contact_id
--WHERE hnci.customer_id IN (
-- SELECT DISTINCT
-- ( [Customer_ID] )
-- FROM [Transactions].[dbo].[Transaction_Header]
-- WHERE actual_transaction_date > '20120218' )
--------------------------------------------------------------------------------
--this will give the same result but with better perfomance then previouse one
--------------------------------------------------------------------------------
SELECT DISTINCT
ct.id AS id
INTO #temp1
FROM [Customer].[dbo].[Contact] ct
JOIN [Customer].[dbo].[Customer_ids] hnci ON ct.id = hnci.contact_id
JOIN ( SELECT DISTINCT
( [Customer_ID] )
FROM [Transactions].[dbo].[Transaction_Header]
WHERE actual_transaction_date > '20120218'
) T ON hnci.customer_id = T.[Customer_ID]
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
INSERT INTO #temp1
( id
)
SELECT DISTINCT
contact_id AS id
FROM [Customer].[dbo].[Restaurant_Attendance]
WHERE ( created > '2012-02-18 00:00:00.000'
OR modified > '2012-02-18 00:00:00.000'
)
AND ( [Fifth_Floor_London] = 1
OR [Fourth_Floor_Leeds] = 1
OR [Second_Floor_Bristol] = 1
)
INSERT INTO #temp1
( id
)
SELECT DISTINCT
( ct.id )
FROM [Customer].[dbo].[Contact] ct
INNER JOIN [Customer].[dbo].[Wifinity_Devices] wfd ON ct.wifinity_uniqueID = wfd.[CustomerUniqueID]
AND startconnection > '2012-02-17'
INSERT INTO #temp1
( id
)
SELECT DISTINCT
comdt.id AS id
FROM [Customer].[dbo].[Complete_dataset] comdt
LEFT JOIN [Customer].[dbo].[Aggregate_Spend_Counts] agsc ON comdt.id = agsc.contact_id
WHERE agsc.contact_id IS NULL
AND ( opt_out_Mail <> 1
OR opt_out_email <> 1
OR opt_out_SMS <> 1
OR opt_out_Mail IS NULL
OR opt_out_email IS NULL
OR opt_out_SMS IS NULL
)
AND ( address_1 IS NOT NULL
OR email IS NOT NULL
OR mobile IS NOT NULL
)
INSERT INTO #temp1
( id
)
SELECT DISTINCT
( contact_id ) AS id
FROM [Customer].[dbo].[VIP_Card_Holders]
WHERE VIP_Card_number IS NOT NULL
SELECT DISTINCT
id
FROM #temp1 AS T

ordering by sql on fields not in projection

Is it possible to order the results of an SQL query, on a field that is not in the projection itself?
See example below - I am taking the distinct ID of a product table, but I want it ordered by title. I don't want to include the title because I am using NHibernate to generate a query, and page the results. I am then using this distinct ID resultset, to load the actual results.
SELECT
DISTINCT this_.`ID` AS y0
FROM
`Product` this_
LEFT OUTER JOIN
`Brand` brand3_
ON this_.BrandId=brand3_.ID
INNER JOIN
`Product_CultureInfo` productcul2_
ON this_.ID=productcul2_.ProductID
AND (
(
(
productcul2_.`Deleted` = 0
OR productcul2_.`Deleted` IS NULL
)
AND (
productcul2_.`_Temporary_Flag` = 0
OR productcul2_.`_Temporary_Flag` IS NULL
)
)
)
INNER JOIN
`ProductCategory` aliasprodu1_
ON this_.ID=aliasprodu1_.ProductID
AND (
(
(
aliasprodu1_.`Deleted` = 0
OR aliasprodu1_.`Deleted` IS NULL
)
AND (
aliasprodu1_.`_Temporary_Flag` = 0
OR aliasprodu1_.`_Temporary_Flag` IS NULL
)
)
)
WHERE
(
this_._Temporary_Flag =FALSE
OR this_._Temporary_Flag IS NULL
)
AND this_.Published = TRUE
AND (
this_.Deleted = FALSE
OR this_.Deleted IS NULL
)
AND (
this_._ComputedDeletedValue = FALSE
OR this_._ComputedDeletedValue IS NULL
)
AND (
(
this_._TestItemSessionGuid IS NULL
OR this_._TestItemSessionGuid = ''
)
)
AND (
productcul2_._ActualTitle LIKE '%silver%'
OR brand3_.Title LIKE '%silver%'
OR aliasprodu1_.CategoryId IN (
47906817 , 47906818 , 47906819 , 47906816 , 7012353 , 44662785
)
)
AND this_.Published = TRUE
AND this_.Published = TRUE
ORDER BY
this_.Priority ASC,
productcul2_._ActualTitle ASC,
this_.Priority ASC LIMIT 25;
Don't know if there's a better solution but how about a nested select where the external query exlude the field that you're not interested in?
So, something like that on a "random" table
SELECT a,b,c from (SELECT a,b,c,d from myTable order by d)
Obviously if there is a "language-direct" solution will be better because, in that way, you have to do two projection and one of those is useless