I have two SQL Server tables:
Invoice (invoice)
Invoice Relations (invoice_relation)
invoice table stores all invoice records with a transaction folio.
invoice_relation table stores any relation between invoices.
This is an example of how invoices can be related between each other:
So the goal is to find the "folio" under invoice table given an invoicenumber and a folio but the folio sometimes won't be the folio that the invoice has, so I need to do a search on all the tree relation in order to find if any invoice match the invoice number but also the folio is part of the relation.
For example I have to find the folio and match invoice number of:
Folio: 1003
Invoice Number: A1122
In my query I would need to first find by folio because it's my invoice table primary key. Then, will try to match A1122 with D1122 that won't match, so then I have to search all the tree structure to find if there is a A1122. The result will be that the invoice A1122 was found in folio 1000.
Any clue on how to do this?
Here is a script of how to create the above example tables with data:
CREATE TABLE [dbo].[invoice](
[folio] [int] NOT NULL,
[invoicenumber] [nvarchar](20) NOT NULL,
[isactive] [bit] NOT NULL,
CONSTRAINT [PK_invoice] PRIMARY KEY CLUSTERED
(
[folio] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
CREATE TABLE [dbo].[invoice_relation](
[relationid] [int] NOT NULL,
[invoice] [nvarchar](20) NOT NULL,
[parentinvoice] [nvarchar](20) NOT NULL,
CONSTRAINT [PK_invoice_relation_1] PRIMARY KEY CLUSTERED
(
[relationid] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
INSERT [dbo].[invoice] ([folio], [invoicenumber], [isactive]) VALUES (1000, N'A1122', 1)
GO
INSERT [dbo].[invoice] ([folio], [invoicenumber], [isactive]) VALUES (1001, N'B1122', 1)
GO
INSERT [dbo].[invoice] ([folio], [invoicenumber], [isactive]) VALUES (1002, N'C1122', 1)
GO
INSERT [dbo].[invoice] ([folio], [invoicenumber], [isactive]) VALUES (1003, N'D1122', 1)
GO
INSERT [dbo].[invoice] ([folio], [invoicenumber], [isactive]) VALUES (1004, N'F1122', 1)
GO
INSERT [dbo].[invoice] ([folio], [invoicenumber], [isactive]) VALUES (1005, N'G1122', 1)
GO
INSERT [dbo].[invoice_relation] ([relationid], [invoice], [parentinvoice]) VALUES (1, N'A1122', N'B1122')
GO
INSERT [dbo].[invoice_relation] ([relationid], [invoice], [parentinvoice]) VALUES (2, N'C1122', N'A1122')
GO
INSERT [dbo].[invoice_relation] ([relationid], [invoice], [parentinvoice]) VALUES (3, N'D1122', N'A1122')
GO
INSERT [dbo].[invoice_relation] ([relationid], [invoice], [parentinvoice]) VALUES (4, N'F1122', N'B1122')
GO
INSERT [dbo].[invoice_relation] ([relationid], [invoice], [parentinvoice]) VALUES (5, N'G1122', N'F1122')
GO
I am still not sure what you really want, I had written something similar to JamieD77 which is to find top parent and then walk back down tree but then you get children and granchildren that are not directly related to A1122.....
Here is a way to walk up and down the tree and return all children and parents directly related to an invoicenumber
DECLARE #InvoiceNumber NVARCHAR(20) = 'A1122'
DECLARE #Folio INT = 1003
;WITH cteFindParents AS (
SELECT
i.folio
,i.invoicenumber
,CAST(NULL AS NVARCHAR(20)) as ChildInvoiceNumber
,CAST(NULL AS NVARCHAR(20)) as ParentInvoiceNumber
,0 as Level
FROM
dbo.invoice i
WHERE
i.invoicenumber = #InvoiceNumber
UNION ALL
SELECT
i.folio
,i.invoicenumber
,c.invoicenumber as ChildInvoiceNumber
,i.invoicenumber as ParentInvoiceNumber
,c.Level - 1 as Level
FROM
cteFindParents c
INNER JOIN dbo.invoice_relation r
ON c.invoicenumber = r.invoice
INNER JOIN dbo.invoice i
ON r.parentinvoice = i.invoicenumber
)
, cteFindChildren as (
SELECT *
FROM
cteFindParents
UNION ALL
SELECT
i.folio
,i.invoicenumber
,i.invoicenumber AS ChildInvoiceNumber
,c.invoicenumber AS ParentInvoiceNumber
,Level + 1 as Level
FROM
cteFindChildren c
INNER JOIN dbo.invoice_relation r
ON c.invoicenumber = r.parentinvoice
INNER JOIN dbo.invoice i
ON r.invoice = i.invoicenumber
WHERE
c.Level = 0
)
SELECT *
FROM
cteFindChildren
But Depending on what exactly you are looking for you may actually get a couple of cousins that are not desired.....
--------------Here was a method to find top parent and get the whole tree
DECLARE #InvoiceNumber NVARCHAR(20) = 'A1122'
DECLARE #Folio INT = 1003
;WITH cteFindParents AS (
SELECT
i.folio
,i.invoicenumber
,CAST(NULL AS NVARCHAR(20)) as ChildInvoiceNumber
,0 as Level
FROM
dbo.invoice i
WHERE
i.invoicenumber = #InvoiceNumber
UNION ALL
SELECT
i.folio
,i.invoicenumber
,c.invoicenumber as ChildInvoiceNumber
,c.Level + 1 as Level
FROM
cteFindParents c
INNER JOIN dbo.invoice_relation r
ON c.invoicenumber = r.invoice
INNER JOIN dbo.invoice i
ON r.parentinvoice = i.invoicenumber
)
, cteGetTopParent AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY 1 ORDER BY LEVEL DESC) as RowNum
FROM
cteFindParents
)
, cteGetWholeTree AS (
SELECT
p.folio
,p.invoicenumber
,p.invoicenumber as TopParent
,p.invoicenumber as Parent
,CAST(p.invoicenumber AS NVARCHAR(1000)) as Hierarchy
,0 as Level
FROM
cteGetTopParent p
WHERE
RowNum = 1
UNION ALL
SELECT
i.folio
,i.invoicenumber
,c.TopParent
,c.invoicenumber AS Parent
,CAST(c.TopParent + '|' + (CASE WHEN Level > 0 THEN c.invoicenumber + '|' ELSE '' END) + i.invoicenumber AS NVARCHAR(1000)) as Hierarchy
,Level + 1 as Level
FROM
cteGetWholeTree c
INNER JOIN dbo.invoice_relation r
ON c.invoicenumber = r.parentinvoice
INNER JOIN dbo.invoice i
ON r.invoice = i.invoicenumber
)
SELECT *
FROM
cteGetWholeTree
Your model is broken to begin with. parentinvoice should be in the invoice table. It's a recursive database model....so make the table schema recursive. Have a nullable foreign key in a column that references it's own table. Any time that field (the parent invoice field) is null indicates that it is the primary invoice. Any row having a parent is a piece of invoice.
When you want to find a value in a tree level structure you wrap your initial sql query into a 'SELECT(.....)' statement (creating your own custom selectable table) that filters out what you want. Let me know if you have any questions!
I was a little unclear as to your actual requirements, so I figured a Table Valued Function may be appropriate here. I added a few optional items, and if unwanted, they are easy enough to remove (i.e. TITLE, Nesting, TopInvoice, TopFolio). Also, you may notice the Range Keys (R1/R2). These serve many functions: Presentation Sequence, Selection Criteria, Parent/Leaf Indicators, and perhaps most importantly Non-Recursive Aggregation.
To Return the Entire Hierarchy
Select * from [dbo].[udf_SomeFunction](NULL,NULL)
To Return an Invoice and ALL of its descendants
Select * from [dbo].[udf_SomeFunction]('A1122',NULL)
To Return the PATH of a Folio
Select * from [dbo].[udf_SomeFunction](NULL,'1003')
To Return a Folio Limited to an Invoice
Select * from [dbo].[udf_SomeFunction]('A1122','1003')
The following code requires SQL 2012+
CREATE FUNCTION [dbo].[udf_SomeFunction](#Invoice nvarchar(25),#Folio nvarchar(25))
Returns Table
As
Return (
with cteBld as (
Select Seq = cast(1000+Row_Number() over (Order By Invoice) as nvarchar(500)),I.Invoice,I.ParentInvoice,Lvl=1,Title = I.Invoice,F.Folio
From (
Select Distinct
Invoice=ParentInvoice
,ParentInvoice=cast(NULL as nvarchar(20))
From [Invoice_Relation]
Where #Invoice is NULL and ParentInvoice Not In (Select Invoice from [Invoice_Relation])
Union All
Select Invoice
,ParentInvoice
From [Invoice_Relation]
Where Invoice=#Invoice
) I
Join Invoice F on I.Invoice=F.InvoiceNumber
Union All
Select Seq = cast(concat(A.Seq,'.',1000+Row_Number() over (Order by I.Invoice)) as nvarchar(500))
,I.Invoice
,I.ParentInvoice
,A.Lvl+1
,I.Invoice,F.folio
From [Invoice_Relation] I
Join cteBld A on I.ParentInvoice = A.Invoice
Join Invoice F on I.Invoice=F.InvoiceNumber )
,cteR1 as (Select Seq,Invoice,Folio,R1=Row_Number() over (Order By Seq) From cteBld)
,cteR2 as (Select A.Seq,A.Invoice,R2=Max(B.R1) From cteR1 A Join cteR1 B on (B.Seq like A.Seq+'%') Group By A.Seq,A.Invoice )
Select Top 100 Percent
B.R1
,C.R2
,A.Invoice
,A.ParentInvoice
,A.Lvl
,Title = Replicate('|-----',A.Lvl-1)+A.Title -- Optional: Added for Readability
,A.Folio
,TopInvoice = First_Value(A.Invoice) over (Order By R1)
,TopFolio = First_Value(A.Folio) over (Order By R1)
From cteBld A
Join cteR1 B on A.Invoice=B.Invoice
Join cteR2 C on A.Invoice=C.Invoice
Where (#Folio is NULL)
or (#Folio is Not NULL and (Select R1 from cteR1 Where Folio=#Folio) between R1 and R2)
Order By R1
)
Final Thoughts:
This certainly may be more than what you were looking, and there is good chance that I COMPLETELY misunderstood your requirements. That said, being a TVF you can expand with additional WHERE and/or ORDER clauses or even incorporate into a CROSS APPLY.
This uses an approach of using a hierarchyid, first generating hierarchyid for every row, then selecting the row where folio is 1003, then finding all ancestors who have an invoicenumber of 'A1122'. It's not very efficient but may give you some different ideas:
;WITH
Allfolios
AS
(
Select i.folio, i.InvoiceNumber,
hierarchyid::Parse('/' +
CAST(ROW_NUMBER()
OVER (ORDER BY InvoiceNumber) AS VARCHAR(30)
) + '/') AS hierarchy, 1 as level
from invoice i
WHERE NOT EXISTS
(SELECT * FROM invoice_relation ir WHERE ir.invoice = i. invoicenumber)
UNION ALL
SELECT i.folio, i.invoiceNumber,
hierarchyid::Parse(CAST(a.hierarchy as VARCHAR(30)) +
CAST(ROW_NUMBER()
OVER (ORDER BY a.InvoiceNumber)
AS VARCHAR(30)) + '/') AS hierarchy,
level + 1
FROM Allfolios A
INNER JOIN invoice_relation ir
on a.InvoiceNumber = ir.ParentInvoice
INNER JOIN invoice i
on ir.Invoice = i.invoicenumber
),
Ancestors
AS
(
SELECT folio, invoiceNumber, hierarchy, hierarchy.GetAncestor(1) as AncestorId
from Allfolios
WHERE folio = 1003
UNION ALL
SELECT af.folio, af.invoiceNumber, af.hierarchy, af.hierarchy.GetAncestor(1)
FROM Allfolios AF
INNER JOIN
Ancestors a ON Af.hierarchy= a.AncestorId
)
SELECT *
FROM Ancestors
WHERE InvoiceNumber = 'A1122'
Edited for the case highlighted by #jj32 where you wish to find the root element in the hierarchy which folio 1003 is in, then find any descendant of that root which has an invoice number of 'A1122'. See below:
;WITH
Allfolios -- Convert all rows to a hierarchy
AS
(
Select i.folio, i.InvoiceNumber,
hierarchyid::Parse('/' +
CAST(ROW_NUMBER()
OVER (ORDER BY InvoiceNumber) AS VARCHAR(30)
) + '/') AS hierarchy, 1 as level
from invoice i
WHERE NOT EXISTS
(SELECT * FROM invoice_relation ir WHERE ir.invoice = i. invoicenumber)
UNION ALL
SELECT i.folio, i.invoiceNumber,
hierarchyid::Parse(CAST(a.hierarchy as VARCHAR(30)) +
CAST(ROW_NUMBER()
OVER (ORDER BY a.InvoiceNumber)
AS VARCHAR(30)) + '/') AS hierarchy,
level + 1
FROM Allfolios A
INNER JOIN invoice_relation ir
on a.InvoiceNumber = ir.ParentInvoice
INNER JOIN invoice i
on ir.Invoice = i.invoicenumber
),
Root -- Find Root
AS
(
SELECT *
FROM AllFolios AF
WHERE Level = 1 AND
(SELECT hierarchy.IsDescendantOf(AF.hierarchy) from AllFolios AF2 WHERE folio = 1003) = 1
)
-- Find all descendants of the root element which have an invoicenumber = 'A1122'
SELECT *
FROM ALLFolios
WHERE hierarchy.IsDescendantOf((SELECT TOP 1 hierarchy FROM Root)) = 1 AND
invoicenumber = 'A1122'
this was tricky since you have a separate relation table and the root invoice is not in it.
DECLARE #folio INT = 1003,
#invoice NVARCHAR(20) = 'A1122'
-- find highest level of relationship
;WITH cte AS (
SELECT i.folio,
i.invoicenumber,
ir.parentinvoice,
0 AS [level]
FROM invoice i
LEFT JOIN invoice_relation ir ON ir.invoice = i.invoicenumber
WHERE i.folio = #folio
UNION ALL
SELECT i.folio,
i.invoicenumber,
ir.parentinvoice,
[level] + 1
FROM invoice i
JOIN invoice_relation ir ON ir.invoice = i.invoicenumber
JOIN cte r ON r.parentinvoice = i.invoicenumber
),
-- make sure you get the root folio
rootCte AS (
SELECT COALESCE(oa.folio, c.folio) AS rootFolio
FROM (SELECT *,
ROW_NUMBER() OVER (ORDER BY [level] DESC) Rn
FROM cte ) c
OUTER APPLY (SELECT folio FROM invoice i WHERE i.invoicenumber = c.parentinvoice) oa
WHERE c.Rn = 1
),
-- get all children of root folio
fullTree AS (
SELECT i.folio,
i.invoicenumber
FROM rootCte r
JOIN invoice i ON r.rootFolio = i.folio
UNION ALL
SELECT i.folio,
i.invoicenumber
FROM fullTree ft
JOIN invoice_relation ir ON ir.parentinvoice = ft.invoicenumber
JOIN invoice i ON ir.invoice = i.invoicenumber
)
-- search for invoice
SELECT *
FROM fullTree
WHERE invoicenumber = #invoice
Here's an attempt that flattens out the relations first so you can travel in any direction. Then it does the recursive CTE to work through the levels:
WITH invoicerelation AS
(
select relationid, invoice, parentinvoice AS relatedinvoice
from invoice_relation
union
select relationid, parentinvoice AS invoice, invoice AS relatedinvoice
from invoice_relation
),
cteLevels AS
(
select 0 AS relationid, invoice.folio,
invoicenumber AS invoice, invoicenumber AS relatedinvoice,
0 AS Level
from invoice
UNION ALL
select invoicerelation.relationid, invoice.folio,
invoicerelation.invoice, cteLevels.relatedinvoice,
Level + 1 AS Level
from invoice INNER JOIN
invoicerelation ON invoice.invoicenumber = invoicerelation.invoice INNER JOIN
cteLevels ON invoicerelation.relatedinvoice = cteLevels.invoice
and (ctelevels.relationid <> invoicerelation.relationid)
)
SELECT cteLevels.folio, relatedinvoice, invoice.folio AS invoicefolio, cteLevels.level
from cteLevels INNER JOIN
invoice ON cteLevels.relatedinvoice = invoice.invoicenumber
WHERE cteLevels.folio = 1003 AND cteLevels.relatedinvoice = 'a1122'
I agree with SwampDev's comment that the parentinvoice should really be in the invoice table. This could also be done without a recursive CTE if you know the maximum number of levels of separation between invoices.
I'm trying to join three tables to pull back a list of distinct blog posts with associated assets (images etc) but I keep coming up a cropper. The three tablets are tblBlog, tblAssetLink and tblAssets. The Blog tablet hold the blog, the asset table holds the assets and the Assetlink table links the two together.
tblBlog.BID is the PK in blog, tblAssets.AID is the PK in Assets.
This query works but pulls back multiple posts for the same record. I've tried to use select distinct and group by and even union but as my knowledge is pretty poor with SQL - they all error.
I'd like to also discount any assets that are marked as deleted (tblAssets.Deleted = true) but not hide the associated Blog post (if that's not marked as deleted). If anyone can help - it would be much appreciated! Thanks.
Here's my query so far....
SELECT dbo.tblBlog.BID,
dbo.tblBlog.DateAdded,
dbo.tblBlog.PMonthName,
dbo.tblBlog.PDay,
dbo.tblBlog.Header,
dbo.tblBlog.AddedBy,
dbo.tblBlog.PContent,
dbo.tblBlog.Category,
dbo.tblBlog.Deleted,
dbo.tblBlog.Intro,
dbo.tblBlog.Tags,
dbo.tblAssets.Name,
dbo.tblAssets.Description,
dbo.tblAssets.Location,
dbo.tblAssets.Deleted AS Expr1,
dbo.tblAssetLink.Priority
FROM dbo.tblBlog
LEFT OUTER JOIN dbo.tblAssetLink
ON dbo.tblBlog.BID = dbo.tblAssetLink.BID
LEFT OUTER JOIN dbo.tblAssets
ON dbo.tblAssetLink.AID = dbo.tblAssets.AID
WHERE ( dbo.tblBlog.Deleted = 'False' )
ORDER BY dbo.tblAssetLink.Priority, tblBlog.DateAdded DESC
EDIT
Changed the Where and the order by....
Expected output:
tblBlog.BID = 123
tblBlog.DateAdded = 12/04/2015
tblBlog.Header = This is a header
tblBlog.AddedBy = Persons name
tblBlog.PContent = *text*
tblBlog.Category = Category name
tblBlog.Deleted = False
tblBlog.Intro = *text*
tblBlog.Tags = Tag, Tag, Tag
tblAssets.Name = some.jpg
tblAssets.Description = Asset desc
tblAssets.Location = Location name
tblAssets.Priority = True
Use OUTER APPLY:
DECLARE #b TABLE ( BID INT )
DECLARE #a TABLE ( AID INT )
DECLARE #ba TABLE
(
BID INT ,
AID INT ,
Priority INT
)
INSERT INTO #b
VALUES ( 1 ),
( 2 )
INSERT INTO #a
VALUES ( 1 ),
( 2 ),
( 3 ),
( 4 )
INSERT INTO #ba
VALUES ( 1, 1, 1 ),
( 1, 2, 2 ),
( 2, 1, 1 ),
( 2, 2, 2 )
SELECT *
FROM #b b
OUTER APPLY ( SELECT TOP 1
a.*
FROM #ba ba
JOIN #a a ON a.AID = ba.AID
WHERE ba.BID = b.BID
ORDER BY Priority
) o
Output:
BID AID
1 1
2 1
Something like:
SELECT b.BID ,
b.DateAdded ,
b.PMonthName ,
b.PDay ,
b.Header ,
b.AddedBy ,
b.PContent ,
b.Category ,
b.Deleted ,
b.Intro ,
b.Tags ,
o.Name ,
o.Description ,
o.Location ,
o.Deleted AS Expr1 ,
o.Priority
FROM dbo.tblBlog b
OUTER APPLY ( SELECT TOP 1
a.* ,
al.Priority
FROM dbo.tblAssetLink al
JOIN dbo.tblAssets a ON al.AID = a.AID
WHERE b.BID = al.BID
ORDER BY al.Priority
) o
WHERE b.Deleted = 'False'
You cannot join three tables unless they all have the same attribute. It would work if all tables had BID, but the second join is trying to join AID. Which wont work. They all have to have BID.
Based on your comments
i would like to get is just one asset per blog post (top one ordered
by Priority)
You can change your query as following. I suggest changing the join with dbo.tblAssetLink to filtered one, which contains only one (highest priority) link for every blog.
SELECT dbo.tblBlog.BID,
dbo.tblBlog.DateAdded,
dbo.tblBlog.PMonthName,
dbo.tblBlog.PDay,
dbo.tblBlog.Header,
dbo.tblBlog.AddedBy,
dbo.tblBlog.PContent,
dbo.tblBlog.Category,
dbo.tblBlog.Deleted,
dbo.tblBlog.Intro,
dbo.tblBlog.Tags,
dbo.tblAssets.Name,
dbo.tblAssets.Description,
dbo.tblAssets.Location,
dbo.tblAssets.Deleted AS Expr1,
dbo.tblAssetLink.Priority
FROM dbo.tblBlog
LEFT OUTER JOIN
(SELECT BID, AID,
ROW_NUMBER() OVER (PARTITION BY BID ORDER BY [Priority] DESC) as N
FROM dbo.tblAssetLink) AS filteredAssetLink
ON dbo.tblBlog.BID = filteredAssetLink.BID
LEFT OUTER JOIN dbo.tblAssets
ON filteredAssetLink.AID = dbo.tblAssets.AID
WHERE dbo.tblBlog.Deleted = 'False' AND filteredAssetLink.N = 1
ORDER BY tblBlog.DateAdded DESC
The following script is very slow when its run.
I have no idea how to improve the performance of the script.
Even with a view takes more than quite a lot minutes.
Any idea please share to me.
SELECT DISTINCT
( id )
FROM ( SELECT DISTINCT
ct.id AS id
FROM [Customer].[dbo].[Contact] ct
LEFT JOIN [Customer].[dbo].[Customer_ids] hnci ON ct.id = hnci.contact_id
WHERE hnci.customer_id IN (
SELECT DISTINCT
( [Customer_ID] )
FROM [Transactions].[dbo].[Transaction_Header]
WHERE actual_transaction_date > '20120218' )
UNION
SELECT DISTINCT
contact_id AS id
FROM [Customer].[dbo].[Restaurant_Attendance]
WHERE ( created > '2012-02-18 00:00:00.000'
OR modified > '2012-02-18 00:00:00.000'
)
AND ( [Fifth_Floor_London] = 1
OR [Fourth_Floor_Leeds] = 1
OR [Second_Floor_Bristol] = 1
)
UNION
SELECT DISTINCT
( ct.id )
FROM [Customer].[dbo].[Contact] ct
INNER JOIN [Customer].[dbo].[Wifinity_Devices] wfd ON ct.wifinity_uniqueID = wfd.[CustomerUniqueID]
AND startconnection > '2012-02-17'
UNION
SELECT DISTINCT
comdt.id AS id
FROM [Customer].[dbo].[Complete_dataset] comdt
LEFT JOIN [Customer].[dbo].[Aggregate_Spend_Counts] agsc ON comdt.id = agsc.contact_id
WHERE agsc.contact_id IS NULL
AND ( opt_out_Mail <> 1
OR opt_out_email <> 1
OR opt_out_SMS <> 1
OR opt_out_Mail IS NULL
OR opt_out_email IS NULL
OR opt_out_SMS IS NULL
)
AND ( address_1 IS NOT NULL
OR email IS NOT NULL
OR mobile IS NOT NULL
)
UNION
SELECT DISTINCT
( contact_id ) AS id
FROM [Customer].[dbo].[VIP_Card_Holders]
WHERE VIP_Card_number IS NOT NULL
) AS tbl
Wow, where to start...
--this distinct does nothing. Union is already distinct
--SELECT DISTINCT
-- ( id )
--FROM (
SELECT DISTINCT [Customer_ID] as ID
FROM [Transactions].[dbo].[Transaction_Header]
where actual_transaction_date > '20120218' )
UNION
SELECT
contact_id AS id
FROM [Customer].[dbo].[Restaurant_Attendance]
-- not sure that you are getting the date range you want. Should these be >=
-- if you want everything that occurred on the 18th or after you want >= '2012-02-18 00:00:00.000'
-- if you want everything that occurred on the 19th or after you want >= '2012-02-19 00:00:00.000'
-- the way you have it now, you will get everything on the 18th unless it happened exactly at midnight
WHERE ( created > '2012-02-18 00:00:00.000'
OR modified > '2012-02-18 00:00:00.000'
)
AND ( [Fifth_Floor_London] = 1
OR [Fourth_Floor_Leeds] = 1
OR [Second_Floor_Bristol] = 1
)
-- all of this does nothing because we already have every id in the contact table from the first query
-- UNION
-- SELECT
-- ( ct.id )
-- FROM [Customer].[dbo].[Contact] ct
-- INNER JOIN [Customer].[dbo].[Wifinity_Devices] wfd ON ct.wifinity_uniqueID = wfd.[CustomerUniqueID]
-- AND startconnection > '2012-02-17'
UNION
-- cleaned this up with isnull function and coalesce
SELECT
comdt.id AS id
FROM [Customer].[dbo].[Complete_dataset] comdt
LEFT JOIN [Customer].[dbo].[Aggregate_Spend_Counts] agsc ON comdt.id = agsc.contact_id
WHERE agsc.contact_id IS NULL
AND ( isnull(opt_out_Mail,0) <> 1
OR isnull(opt_out_email,0) <> 1
OR isnull(opt_out_SMS,0) <> 1
)
AND coalesce(address_1 , email, mobile) IS NOT NULL
UNION
SELECT
( contact_id ) AS id
FROM [Customer].[dbo].[VIP_Card_Holders]
WHERE VIP_Card_number IS NOT NULL
-- ) AS tbl
Where exists is generally faster than in as well.
Or conditions are generally slower as well, use more union statements instead.
And learn to use left joins correctly. If you have a where condition (other than where id is null) on the table on teh right side of a left join, it will convert to an inner join. If this is not what you want, then your code is currently giving you an incorrect result set.
See http://wiki.lessthandot.com/index.php/WHERE_conditions_on_a_LEFT_JOIN for an explanation of how to fix.
As stated in a comment optimize one at a time. See which one takes the longest and focus on that one.
union will remove duplicates so you don't need the distinct on the individual queries
On you first I would try this:
The left join is killed by the WHERE hnci.customer_id IN so you might as well have a join.
The sub-query is not efficient as cannot use an index on the IN.
The query optimizer does not know what in ( select .. ) will return so it cannot optimize use of indexes.
SELECT ct.id AS id
FROM [Customer].[dbo].[Contact] ct
JOIN [Customer].[dbo].[Customer_ids] hnci
ON ct.id = hnci.contact_id
JOIN [Transactions].[dbo].[Transaction_Header] th
on hnci.customer_id = th.[Customer_ID]
and th.actual_transaction_date > '20120218'
On that second join the query optimizer has the opportunity of which condition to apply first. Let say [Customer].[dbo].[Customer_ids].[customer_id] and [Transactions].[dbo].[Transaction_Header] each have indexes. The query optimizer has the option to apply that before [Transactions].[dbo].[Transaction_Header].[actual_transaction_date].
If [actual_transaction_date] is not indexed then for sure it would do the other ID join first.
With your in ( select ... ) the query optimizer has no option but to apply the actual_transaction_date > '20120218' first. OK some times query optimizer is smart enough to use an index inside the in outside the in but why make it hard for the query optimizer. I have found the query optimizer make better decisions if you make the decisions easier.
A join on a sub-query has the same problem. You take options away from the query optimizer. Give the query optimizer room to breathe.
try this, temptable should help you:
IF OBJECT_ID('Tempdb..#Temp1') IS NOT NULL
DROP TABLE #Temp1
--Low perfomance because of using "WHERE hnci.customer_id IN ( .... ) " - loop join must be
--and this "where" condition will apply to two tables after left join,
--so result will be same as with two inner joints but with bad perfomance
--SELECT DISTINCT
-- ct.id AS id
--INTO #temp1
--FROM [Customer].[dbo].[Contact] ct
-- LEFT JOIN [Customer].[dbo].[Customer_ids] hnci ON ct.id = hnci.contact_id
--WHERE hnci.customer_id IN (
-- SELECT DISTINCT
-- ( [Customer_ID] )
-- FROM [Transactions].[dbo].[Transaction_Header]
-- WHERE actual_transaction_date > '20120218' )
--------------------------------------------------------------------------------
--this will give the same result but with better perfomance then previouse one
--------------------------------------------------------------------------------
SELECT DISTINCT
ct.id AS id
INTO #temp1
FROM [Customer].[dbo].[Contact] ct
JOIN [Customer].[dbo].[Customer_ids] hnci ON ct.id = hnci.contact_id
JOIN ( SELECT DISTINCT
( [Customer_ID] )
FROM [Transactions].[dbo].[Transaction_Header]
WHERE actual_transaction_date > '20120218'
) T ON hnci.customer_id = T.[Customer_ID]
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
INSERT INTO #temp1
( id
)
SELECT DISTINCT
contact_id AS id
FROM [Customer].[dbo].[Restaurant_Attendance]
WHERE ( created > '2012-02-18 00:00:00.000'
OR modified > '2012-02-18 00:00:00.000'
)
AND ( [Fifth_Floor_London] = 1
OR [Fourth_Floor_Leeds] = 1
OR [Second_Floor_Bristol] = 1
)
INSERT INTO #temp1
( id
)
SELECT DISTINCT
( ct.id )
FROM [Customer].[dbo].[Contact] ct
INNER JOIN [Customer].[dbo].[Wifinity_Devices] wfd ON ct.wifinity_uniqueID = wfd.[CustomerUniqueID]
AND startconnection > '2012-02-17'
INSERT INTO #temp1
( id
)
SELECT DISTINCT
comdt.id AS id
FROM [Customer].[dbo].[Complete_dataset] comdt
LEFT JOIN [Customer].[dbo].[Aggregate_Spend_Counts] agsc ON comdt.id = agsc.contact_id
WHERE agsc.contact_id IS NULL
AND ( opt_out_Mail <> 1
OR opt_out_email <> 1
OR opt_out_SMS <> 1
OR opt_out_Mail IS NULL
OR opt_out_email IS NULL
OR opt_out_SMS IS NULL
)
AND ( address_1 IS NOT NULL
OR email IS NOT NULL
OR mobile IS NOT NULL
)
INSERT INTO #temp1
( id
)
SELECT DISTINCT
( contact_id ) AS id
FROM [Customer].[dbo].[VIP_Card_Holders]
WHERE VIP_Card_number IS NOT NULL
SELECT DISTINCT
id
FROM #temp1 AS T
Is it possible to order the results of an SQL query, on a field that is not in the projection itself?
See example below - I am taking the distinct ID of a product table, but I want it ordered by title. I don't want to include the title because I am using NHibernate to generate a query, and page the results. I am then using this distinct ID resultset, to load the actual results.
SELECT
DISTINCT this_.`ID` AS y0
FROM
`Product` this_
LEFT OUTER JOIN
`Brand` brand3_
ON this_.BrandId=brand3_.ID
INNER JOIN
`Product_CultureInfo` productcul2_
ON this_.ID=productcul2_.ProductID
AND (
(
(
productcul2_.`Deleted` = 0
OR productcul2_.`Deleted` IS NULL
)
AND (
productcul2_.`_Temporary_Flag` = 0
OR productcul2_.`_Temporary_Flag` IS NULL
)
)
)
INNER JOIN
`ProductCategory` aliasprodu1_
ON this_.ID=aliasprodu1_.ProductID
AND (
(
(
aliasprodu1_.`Deleted` = 0
OR aliasprodu1_.`Deleted` IS NULL
)
AND (
aliasprodu1_.`_Temporary_Flag` = 0
OR aliasprodu1_.`_Temporary_Flag` IS NULL
)
)
)
WHERE
(
this_._Temporary_Flag =FALSE
OR this_._Temporary_Flag IS NULL
)
AND this_.Published = TRUE
AND (
this_.Deleted = FALSE
OR this_.Deleted IS NULL
)
AND (
this_._ComputedDeletedValue = FALSE
OR this_._ComputedDeletedValue IS NULL
)
AND (
(
this_._TestItemSessionGuid IS NULL
OR this_._TestItemSessionGuid = ''
)
)
AND (
productcul2_._ActualTitle LIKE '%silver%'
OR brand3_.Title LIKE '%silver%'
OR aliasprodu1_.CategoryId IN (
47906817 , 47906818 , 47906819 , 47906816 , 7012353 , 44662785
)
)
AND this_.Published = TRUE
AND this_.Published = TRUE
ORDER BY
this_.Priority ASC,
productcul2_._ActualTitle ASC,
this_.Priority ASC LIMIT 25;
Don't know if there's a better solution but how about a nested select where the external query exlude the field that you're not interested in?
So, something like that on a "random" table
SELECT a,b,c from (SELECT a,b,c,d from myTable order by d)
Obviously if there is a "language-direct" solution will be better because, in that way, you have to do two projection and one of those is useless