I've been trying to add a switch into the following script.
If #IgnoreExclusions = 1 then I do not want to exclude any of the values in Controltb_AssocAccounts_ExcludedSurnameDOB or in Controltb_AssocAccounts_ExcludedDOB ?
I've included one of my attempts but I don't think this is very readable and also I'm unsure if it works reliably as NULL could be a value in one of the exclusion lists.
DECLARE #IgnoreExclusions TINYINT = 1;
SELECT ua.UserAccountKey,
FROM #Accounts x
INNER JOIN WH.dbo.vw_DimUserAccount ua
ON
( --surname and DOB need to match
x.Surname = ua.Surname AND
x.DOB = ua.DOB
)
AND
x.UserAccountKey <> ua.UserAccountKey
WHERE EXISTS
(
SELECT x.Surname, x.DOB
EXCEPT
SELECT ExcludedSurname,ExcludedDOB
FROM WH.dbo.Controltb_AssocAccounts_ExcludedSurnameDOB
)
AND
EXISTS
(
SELECT x.DOB
--SELECT CASE WHEN #IgnoreExclusions = 1 THEN NULL ELSE x.DOB END --<<<<ATTEMPT
EXCEPT
SELECT ExcludedDOB
FROM WH.dbo.Controltb_AssocAccounts_ExcludedDOB
)
GROUP BY ua.UserAccountKey;
I'm not sure what variant of SQL you're using, but couldn't a simple OR clause do the trick?
DECLARE #IgnoreExclusions TINYINT = 1;
SELECT ua.UserAccountKey,
FROM #Accounts x
INNER JOIN WH.dbo.vw_DimUserAccount ua
ON x.Surname = ua.Surname
AND x.DOB = ua.DOB
AND x.UserAccountKey <> ua.UserAccountKey
WHERE EXISTS
(
SELECT x.Surname, x.DOB
EXCEPT
SELECT ExcludedSurname,ExcludedDOB
FROM WH.dbo.Controltb_AssocAccounts_ExcludedSurnameDOB
)
AND
(
#IgnoreExclusions = 1 OR EXISTS
(
SELECT x.DOB
EXCEPT
SELECT ExcludedDOB
FROM WH.dbo.Controltb_AssocAccounts_ExcludedDOB
)
)
GROUP BY ua.UserAccountKey;
Related
i have this FUNCTION, that check if there are results in the first consult, table_one
if not are results, check in the second_table
separate each query works, but if join it, just work the first sentence but not the second one
CREATE OR REPLACE FUNCTION get_data(id INT)
RETURNS TABLE(
id INT,
created_at TIMESTAMP,
attempts INT,
status VARCHAR
)
language plpgsql
AS
$$
DECLARE
_SENT VARCHAR := 'SENT';
BEGIN
RETURN QUERY
WITH r AS (
SELECT p_i.id, a_r.created_at, a_r.attempts,
CASE a_r.status
WHEN 'PENDING' THEN _SENT
END AS status
FROM table_one p_i
LEFT JOIN (
SELECT a_r.table_one_id, max(a_r.id) id
FROM awa_req a_r
GROUP BY a_r.table_one_id
) last_md on last_md.table_one_id = p_i.id
LEFT JOIN awa_req a_r on a_r.table_one_id = last_md.table_one_id and a_r.id = last_md.id
WHERE p_i.user_id = $1
AND p_i.deleted_at IS NULL
)
SELECT * FROM r
UNION ALL
SELECT p_i.id, m_d.created_at, m_d.attempts,
CASE
WHEN m_d.confirmed_at IS NULL THEN _SENT
END AS status
FROM pay_ins p_i
LEFT JOIN (
SELECT max(t.id) AS id, t.pay_ins_id
FROM table_two t
GROUP BY t.pay_ins_id
) last_md on last_md.pay_ins_id = p_i.id
LEFT JOIN table_two m_d on m_d.pay_ins_id = last_md.pay_ins_id and m_d.id = last_md.id
AND NOT EXISTS (
SELECT * FROM r
);
END;
$$;
best
This part will eliminate all rows from the UNION clause if any rows exist in r:
AND NOT EXISTS (
SELECT * FROM r
);
It should instead be something like:
AND NOT EXISTS (
SELECT FROM r WHERE r.id = p_i.id
)
I am trying to ignore duplicate records in CTE but I am not able to do that, It seems like a SELECT statement inside CTE does not allow to use ROWNUM() variable numrows to condition in WHERE clause as it is showing Invalid column name 'numrows' error while trying to do so.
SQL Query:
DECLARE #BatchID uniqueidentifier = NEWID();
DECLARE #ClusterID SMALLINT = 1;
DECLARE #BatchSize integer = 20000;
DECLARE #myTableVariable TABLE(EventID BIGINT,HotelID int, BatchStatus varchar(50),BatchID uniqueidentifier);
WITH PendingExtResSvcEventsData_Batch
AS(
SELECT TOP (#BatchSize) t.EventID, t.HotelID, t.BatchStatus, t.BatchID, ROW_NUMBER() OVER (PARTITION BY t.EventID ORDER BY t.EventID) numrows
FROM ExtResSvcPendingMsg t WITH (NOLOCK)
WHERE t.ClusterID = #ClusterID AND numrows = 1 AND NOT EXISTS -- not allowed to use WHERE numrows = 1 here showing *Invalid Column Name*
(select 1 from ExtResSvcPendingMsg t2 where t2.BatchStatus = 'Batched'
and t2.EventID = t.EventID and t2.HotelID = t.HotelID)
)
UPDATE PendingExtResSvcEventsData_Batch
SET BatchStatus='Batched',
BatchID = #BatchID
-- WHERE numrows = 1 (not allowed to use WHERE here because of OUTPUT Clause)
OUTPUT INSERTED.* INTO #myTableVariable
SELECT e.ExtResSvcEventID,e.HotelID,e.ID1,e.ID2,e.ExtResSvcEventType,e.HostID,e.StatusCode,e.ChannelID,e.RequestAtTime,e.ProcessTime,e.DateBegin,e.DateEnd,
e.StatusMsg,em.MsgBodyOut,em.MsgBodyIn,e.ChannelResID
FROM ExtResSvcEvent e WITH (NOLOCK)
INNER JOIN #myTableVariable t ON e.ExtResSvcEventID = t.EventID
INNER JOIN ExtResSvcEventXML em with (nolock) on t.EventID = em.ExtResSvcEventID
ORDER BY e.ExtResSvcEventID
I have also tried to use numrows in final SELECT like INNER JOIN #myTableVariable t ON e.ExtResSvcEventID = t.EventID AND t.numrows = 1 but this gives me a error i.e. The column reference "inserted.numrows" is not allowed because it refers to a base table that is not being modified in this statement.
How do I ignore the duplicate records while using SELECT in CTE?
You can't refer to the numrows column in the WHERE clause of the CTE because that column is not calculated at this point in the plan execution. You need to add a second CTE with a select statement where you can refer to the numrows column:
WITH Base AS (
SELECT TOP (#BatchSize) t.EventID, t.HotelID, t.BatchStatus, t.BatchID, ROW_NUMBER() OVER (PARTITION BY t.EventID ORDER BY t.EventID) numrows
FROM ExtResSvcPendingMsg t WITH (NOLOCK)
WHERE t.ClusterID = #ClusterID
AND NOT EXISTS (select 1 from ExtResSvcPendingMsg t2 where t2.BatchStatus = 'Batched' and t2.EventID = t.EventID and t2.HotelID = t.HotelID)
), PendingExtResSvcEventsData_Batch AS (
SELECT EventID,
HotelID,
BatchStatus,
BatchID
WHERE numrows = 1
)
UPDATE...
I can't vouch for the update statement working as you expect it but the PendingExtResSvcEventsData_Batch should now have one row per EventID.
The following script is very slow when its run.
I have no idea how to improve the performance of the script.
Even with a view takes more than quite a lot minutes.
Any idea please share to me.
SELECT DISTINCT
( id )
FROM ( SELECT DISTINCT
ct.id AS id
FROM [Customer].[dbo].[Contact] ct
LEFT JOIN [Customer].[dbo].[Customer_ids] hnci ON ct.id = hnci.contact_id
WHERE hnci.customer_id IN (
SELECT DISTINCT
( [Customer_ID] )
FROM [Transactions].[dbo].[Transaction_Header]
WHERE actual_transaction_date > '20120218' )
UNION
SELECT DISTINCT
contact_id AS id
FROM [Customer].[dbo].[Restaurant_Attendance]
WHERE ( created > '2012-02-18 00:00:00.000'
OR modified > '2012-02-18 00:00:00.000'
)
AND ( [Fifth_Floor_London] = 1
OR [Fourth_Floor_Leeds] = 1
OR [Second_Floor_Bristol] = 1
)
UNION
SELECT DISTINCT
( ct.id )
FROM [Customer].[dbo].[Contact] ct
INNER JOIN [Customer].[dbo].[Wifinity_Devices] wfd ON ct.wifinity_uniqueID = wfd.[CustomerUniqueID]
AND startconnection > '2012-02-17'
UNION
SELECT DISTINCT
comdt.id AS id
FROM [Customer].[dbo].[Complete_dataset] comdt
LEFT JOIN [Customer].[dbo].[Aggregate_Spend_Counts] agsc ON comdt.id = agsc.contact_id
WHERE agsc.contact_id IS NULL
AND ( opt_out_Mail <> 1
OR opt_out_email <> 1
OR opt_out_SMS <> 1
OR opt_out_Mail IS NULL
OR opt_out_email IS NULL
OR opt_out_SMS IS NULL
)
AND ( address_1 IS NOT NULL
OR email IS NOT NULL
OR mobile IS NOT NULL
)
UNION
SELECT DISTINCT
( contact_id ) AS id
FROM [Customer].[dbo].[VIP_Card_Holders]
WHERE VIP_Card_number IS NOT NULL
) AS tbl
Wow, where to start...
--this distinct does nothing. Union is already distinct
--SELECT DISTINCT
-- ( id )
--FROM (
SELECT DISTINCT [Customer_ID] as ID
FROM [Transactions].[dbo].[Transaction_Header]
where actual_transaction_date > '20120218' )
UNION
SELECT
contact_id AS id
FROM [Customer].[dbo].[Restaurant_Attendance]
-- not sure that you are getting the date range you want. Should these be >=
-- if you want everything that occurred on the 18th or after you want >= '2012-02-18 00:00:00.000'
-- if you want everything that occurred on the 19th or after you want >= '2012-02-19 00:00:00.000'
-- the way you have it now, you will get everything on the 18th unless it happened exactly at midnight
WHERE ( created > '2012-02-18 00:00:00.000'
OR modified > '2012-02-18 00:00:00.000'
)
AND ( [Fifth_Floor_London] = 1
OR [Fourth_Floor_Leeds] = 1
OR [Second_Floor_Bristol] = 1
)
-- all of this does nothing because we already have every id in the contact table from the first query
-- UNION
-- SELECT
-- ( ct.id )
-- FROM [Customer].[dbo].[Contact] ct
-- INNER JOIN [Customer].[dbo].[Wifinity_Devices] wfd ON ct.wifinity_uniqueID = wfd.[CustomerUniqueID]
-- AND startconnection > '2012-02-17'
UNION
-- cleaned this up with isnull function and coalesce
SELECT
comdt.id AS id
FROM [Customer].[dbo].[Complete_dataset] comdt
LEFT JOIN [Customer].[dbo].[Aggregate_Spend_Counts] agsc ON comdt.id = agsc.contact_id
WHERE agsc.contact_id IS NULL
AND ( isnull(opt_out_Mail,0) <> 1
OR isnull(opt_out_email,0) <> 1
OR isnull(opt_out_SMS,0) <> 1
)
AND coalesce(address_1 , email, mobile) IS NOT NULL
UNION
SELECT
( contact_id ) AS id
FROM [Customer].[dbo].[VIP_Card_Holders]
WHERE VIP_Card_number IS NOT NULL
-- ) AS tbl
Where exists is generally faster than in as well.
Or conditions are generally slower as well, use more union statements instead.
And learn to use left joins correctly. If you have a where condition (other than where id is null) on the table on teh right side of a left join, it will convert to an inner join. If this is not what you want, then your code is currently giving you an incorrect result set.
See http://wiki.lessthandot.com/index.php/WHERE_conditions_on_a_LEFT_JOIN for an explanation of how to fix.
As stated in a comment optimize one at a time. See which one takes the longest and focus on that one.
union will remove duplicates so you don't need the distinct on the individual queries
On you first I would try this:
The left join is killed by the WHERE hnci.customer_id IN so you might as well have a join.
The sub-query is not efficient as cannot use an index on the IN.
The query optimizer does not know what in ( select .. ) will return so it cannot optimize use of indexes.
SELECT ct.id AS id
FROM [Customer].[dbo].[Contact] ct
JOIN [Customer].[dbo].[Customer_ids] hnci
ON ct.id = hnci.contact_id
JOIN [Transactions].[dbo].[Transaction_Header] th
on hnci.customer_id = th.[Customer_ID]
and th.actual_transaction_date > '20120218'
On that second join the query optimizer has the opportunity of which condition to apply first. Let say [Customer].[dbo].[Customer_ids].[customer_id] and [Transactions].[dbo].[Transaction_Header] each have indexes. The query optimizer has the option to apply that before [Transactions].[dbo].[Transaction_Header].[actual_transaction_date].
If [actual_transaction_date] is not indexed then for sure it would do the other ID join first.
With your in ( select ... ) the query optimizer has no option but to apply the actual_transaction_date > '20120218' first. OK some times query optimizer is smart enough to use an index inside the in outside the in but why make it hard for the query optimizer. I have found the query optimizer make better decisions if you make the decisions easier.
A join on a sub-query has the same problem. You take options away from the query optimizer. Give the query optimizer room to breathe.
try this, temptable should help you:
IF OBJECT_ID('Tempdb..#Temp1') IS NOT NULL
DROP TABLE #Temp1
--Low perfomance because of using "WHERE hnci.customer_id IN ( .... ) " - loop join must be
--and this "where" condition will apply to two tables after left join,
--so result will be same as with two inner joints but with bad perfomance
--SELECT DISTINCT
-- ct.id AS id
--INTO #temp1
--FROM [Customer].[dbo].[Contact] ct
-- LEFT JOIN [Customer].[dbo].[Customer_ids] hnci ON ct.id = hnci.contact_id
--WHERE hnci.customer_id IN (
-- SELECT DISTINCT
-- ( [Customer_ID] )
-- FROM [Transactions].[dbo].[Transaction_Header]
-- WHERE actual_transaction_date > '20120218' )
--------------------------------------------------------------------------------
--this will give the same result but with better perfomance then previouse one
--------------------------------------------------------------------------------
SELECT DISTINCT
ct.id AS id
INTO #temp1
FROM [Customer].[dbo].[Contact] ct
JOIN [Customer].[dbo].[Customer_ids] hnci ON ct.id = hnci.contact_id
JOIN ( SELECT DISTINCT
( [Customer_ID] )
FROM [Transactions].[dbo].[Transaction_Header]
WHERE actual_transaction_date > '20120218'
) T ON hnci.customer_id = T.[Customer_ID]
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
INSERT INTO #temp1
( id
)
SELECT DISTINCT
contact_id AS id
FROM [Customer].[dbo].[Restaurant_Attendance]
WHERE ( created > '2012-02-18 00:00:00.000'
OR modified > '2012-02-18 00:00:00.000'
)
AND ( [Fifth_Floor_London] = 1
OR [Fourth_Floor_Leeds] = 1
OR [Second_Floor_Bristol] = 1
)
INSERT INTO #temp1
( id
)
SELECT DISTINCT
( ct.id )
FROM [Customer].[dbo].[Contact] ct
INNER JOIN [Customer].[dbo].[Wifinity_Devices] wfd ON ct.wifinity_uniqueID = wfd.[CustomerUniqueID]
AND startconnection > '2012-02-17'
INSERT INTO #temp1
( id
)
SELECT DISTINCT
comdt.id AS id
FROM [Customer].[dbo].[Complete_dataset] comdt
LEFT JOIN [Customer].[dbo].[Aggregate_Spend_Counts] agsc ON comdt.id = agsc.contact_id
WHERE agsc.contact_id IS NULL
AND ( opt_out_Mail <> 1
OR opt_out_email <> 1
OR opt_out_SMS <> 1
OR opt_out_Mail IS NULL
OR opt_out_email IS NULL
OR opt_out_SMS IS NULL
)
AND ( address_1 IS NOT NULL
OR email IS NOT NULL
OR mobile IS NOT NULL
)
INSERT INTO #temp1
( id
)
SELECT DISTINCT
( contact_id ) AS id
FROM [Customer].[dbo].[VIP_Card_Holders]
WHERE VIP_Card_number IS NOT NULL
SELECT DISTINCT
id
FROM #temp1 AS T
Is it possible to order the results of an SQL query, on a field that is not in the projection itself?
See example below - I am taking the distinct ID of a product table, but I want it ordered by title. I don't want to include the title because I am using NHibernate to generate a query, and page the results. I am then using this distinct ID resultset, to load the actual results.
SELECT
DISTINCT this_.`ID` AS y0
FROM
`Product` this_
LEFT OUTER JOIN
`Brand` brand3_
ON this_.BrandId=brand3_.ID
INNER JOIN
`Product_CultureInfo` productcul2_
ON this_.ID=productcul2_.ProductID
AND (
(
(
productcul2_.`Deleted` = 0
OR productcul2_.`Deleted` IS NULL
)
AND (
productcul2_.`_Temporary_Flag` = 0
OR productcul2_.`_Temporary_Flag` IS NULL
)
)
)
INNER JOIN
`ProductCategory` aliasprodu1_
ON this_.ID=aliasprodu1_.ProductID
AND (
(
(
aliasprodu1_.`Deleted` = 0
OR aliasprodu1_.`Deleted` IS NULL
)
AND (
aliasprodu1_.`_Temporary_Flag` = 0
OR aliasprodu1_.`_Temporary_Flag` IS NULL
)
)
)
WHERE
(
this_._Temporary_Flag =FALSE
OR this_._Temporary_Flag IS NULL
)
AND this_.Published = TRUE
AND (
this_.Deleted = FALSE
OR this_.Deleted IS NULL
)
AND (
this_._ComputedDeletedValue = FALSE
OR this_._ComputedDeletedValue IS NULL
)
AND (
(
this_._TestItemSessionGuid IS NULL
OR this_._TestItemSessionGuid = ''
)
)
AND (
productcul2_._ActualTitle LIKE '%silver%'
OR brand3_.Title LIKE '%silver%'
OR aliasprodu1_.CategoryId IN (
47906817 , 47906818 , 47906819 , 47906816 , 7012353 , 44662785
)
)
AND this_.Published = TRUE
AND this_.Published = TRUE
ORDER BY
this_.Priority ASC,
productcul2_._ActualTitle ASC,
this_.Priority ASC LIMIT 25;
Don't know if there's a better solution but how about a nested select where the external query exlude the field that you're not interested in?
So, something like that on a "random" table
SELECT a,b,c from (SELECT a,b,c,d from myTable order by d)
Obviously if there is a "language-direct" solution will be better because, in that way, you have to do two projection and one of those is useless
I am trying to generate
SELECT DISTINCT
P.DOMAIN_ID,
P.SOURCE_SYSTEM_ID
FROM EDW.dbo.DOMAIN_VALUE AS P
WHERE P.ID = 4
AND CURRENT_FLAG = 'Y'
EXCEPT
( SELECT F.DOMAIN_ID,
F.SOURCE_SYSTEM_ID
FROM EDW.dbo.DOMAIN AS F
WHERE F.ID = 4
AND F.CURRENT_FLAG = 'Y'
)
FOR XML PATH('DOMAIN'),
ROOT('DOMAIN_VALUE')
The output value in XML in Result tab as
<REFERENCE_DOMAIN_VALUE>
<REFERENCE_DOMAIN>
<REFERENCE_DOMAIN_ID>10799</REFERENCE_DOMAIN_ID>
<REFERENCE_SOURCE_SYSTEM_ID>7452-001</REFERENCE_SOURCE_SYSTEM_ID>
</REFERENCE_DOMAIN>
</REFERENCE_DOMAIN_VALUE>
Now I need to convert this XML out to varchar(max) but the result needs to be same.
Just subquery it into a scalar value and convert it. The trick here is that FOR XML in a subquery and EXCEPT on top don't mix, so subquery the EXCEPT part first.
SELECT CONVERT(varchar(max), (
SELECT * FROM (
SELECT DISTINCT P.DOMAIN_ID, P.SOURCE_SYSTEM_ID
FROM EDW.dbo.DOMAIN_VALUE AS P
WHERE P.ID = 4 AND CURRENT_FLAG = 'Y'
EXCEPT (
SELECT F.DOMAIN_ID, F.SOURCE_SYSTEM_ID
FROM EDW.dbo.DOMAIN AS F
WHERE F.ID = 4 AND F.CURRENT_FLAG = 'Y' )
) I
FOR XML PATH('DOMAIN'), ROOT('DOMAIN_VALUE')
))