I have a large query that I have pasted parts of below.
I am wanting to use the result of the first join in my second join.
What I am trying to do get the last session that has a lead_conversion then I am getting all sessions in between then and the current row
This is the part I am struggling with
left join (
select ss.id, ss.session_start, ss.lead_id
from sessions ss
inner join lead_conversions inner_lc on inner_lc.session_id = ss.id
) prev_lc
on prev_lc.lead_id = lc.lead_id
and prev_lc.session_start::TIMESTAMP < s.session_start::TIMESTAMP
left join cte_sessions reset_prev_sess
on reset_prev_sess.lead_id = lc.lead_id
and reset_prev_sess.session_start::TIMESTAMP <= s.session_start::TIMESTAMP
and (
prev_lc.session_start::TIMESTAMP IS NULL
OR
reset_prev_sess.session_start::TIMESTAMP > prev_lc.session_start::TIMESTAMP
)
my issue is I cant just fetch the last prev_lc and I cant seem to use max(prev_lc.session_start)
I have tried grouping in first select and using max but this does not work as I believe this is ran before the on
left join (
select max(ss.session_start) as session_start, max(ss.lead_id) as lead_id
from sessions ss
inner join lead_conversions inner_lc on inner_lc.session_id = ss.id
group by inner_lc.id
) prev_lc on prev_lc.lead_id = lc.lead_id
I have also tried using max in the second join but this give the error
SQL compilation error: Invalid aggregate function in ON clause [MAX(CAST(PREV_LC.SESSION_START AS TIMESTAMP_NTZ(9)))]
left join cte_sessions reset_prev_sess
on reset_prev_sess.lead_id = lc.lead_id
and reset_prev_sess.session_start::TIMESTAMP <= s.session_start::TIMESTAMP
and (
prev_lc.session_start::TIMESTAMP IS NULL
OR
reset_prev_sess.session_start::TIMESTAMP > max(prev_lc.session_start::TIMESTAMP)
)
any help with this would be very appreciated
Thank you
if I understand correctly you are looking for to join with the last session start,so what you can do is to order by startsession in your subquery and limit to 1 record:
left join (
select ss.id, ss.session_start, ss.lead_id
from sessions ss
inner join lead_conversions inner_lc on inner_lc.session_id = ss.id
order by ss.session_start desc
limit 1
) prev_lc
the rest of query stays untouched.
So I have found a solution for this if any one comes across this. I ended up just rethinking how I go about it.
I ended up adding a row number for each conversion
with cte_sessions as (
select
s.id
,s.lead_id
,s.session_start::TIMESTAMP as session_start
,CASE WHEN MAX(lc.id) IS NOT NULL
then ROW_NUMBER() over (partition by s.lead_id, (CASE WHEN
MAX(lc.id) IS NOT NULL then 1 else 0 end)
order by s.session_start
)
END as conversion_row
from sessions s
left join lead_conversions lc on lc.session_id = s.id
group by s.id, s.session_start, s.lead_id, s.project_id, s.crawler_id
order by s.session_start
)
The I just did this in the join
left join cte_sessions prev_lc on prev_lc.lead_id = lc.lead_id and prev_lc.conversion_row = s.conversion_row - 1
I have the following SQL select statement that I use to get a subset of products, or wines:
SELECT pv.SkProdVariantId AS id,
pa.Colour AS colour,
FROM Dim.ProductVariant AS pv
JOIN ProductAttributes_new AS pa
ON pv.SkProdVariantId = pa.SkProdVariantId
WHERE pv.ProdTypeName = 'Wines'
The length of this table generated is 3,905. I want to get all the transactional data for these products.
At the moment I'm using this select statement
SELECT c.CalDate AS timestamp,
f.SkProductVariantId AS sku_id,
f.Quantity AS quantity
FROM fact.FTransactions AS f
LEFT JOIN Dim.Calendar AS c
ON f.SkDateId = c.SkDateId
LEFT JOIN (
SELECT pv.SkProdVariantId AS id,
pa.Colour AS colour,
FROM Dim.ProductVariant AS pv
JOIN ProductAttributes_new AS pa
ON pv.SkProdVariantId = pa.SkProdVariantId
WHERE pv.ProdTypeName = 'Wines'
) AS s
ON s.id = f.SkProductVariantId
WHERE c.CalDate LIKE '%2019%'
The calendar dates are correct, but the number of unique products returned is 5,648, rather than the expected 3,905 from the select query.
Why does my LEFT JOIN on the first select query not work as I expect it to, please?
Thanks for any help!
If you want all the rows form your query, it needs to be the first reference in the LEFT JOIN. Then, I am guessing that you want transaction in 2019:
select . . .
from (SELECT pv.SkProdVariantId AS id, pa.Colour AS colour,
FROM Dim.ProductVariant pv JOIN
ProductAttributes_new pa
ON pv.SkProdVariantId = pa.SkProdVariantId
WHERE pv.ProdTypeName = 'Wines'
) s LEFT JOIN
(fact.FTransactions f JOIN
Dim.Calendar c
ON f.SkDateId = c.SkDateId AND
c.CalDate >= '2019-01-01' AND
c.CalDate < '2020-01-01'
)
ON s.id = f.SkProductVariantId;
Note that this assumes that CalDate is really a date and not a string. LIKE should only be used on strings.
You misunderstand somehow how outer joins work. See Gordon's answer and my request comment on that.
As to the task: It seems you want to select transactions of 2019, but you want to restrict your results to wine products. We typically restrict query results in the WHERE clause. You can use IN or EXISTS for that.
SELECT
c.CalDate AS timestamp,
f.SkProductVariantId AS sku_id,
f.Quantity AS quantity
FROM fact.FTransactions AS f
INNER JOIN Dim.Calendar AS c ON f.SkDateId = c.SkDateId
WHERE DATEPART(YEAR, c.CalDate) = 2019
AND f.SkProductVariantId IN
(
SELECT pv.SkProdVariantId
FROM Dim.ProductVariant AS pv
WHERE pv.ProdTypeName = 'Wines'
);
(I've removed the join to ProductAttributes_new, because it doesn't seem to play any part in this query.)
I am trying to write a query that gets Income Statements for Investments and calculates how much each has made dependent on the time frame.
SELECT
iis.GrossIncome, dates.[Month], dates.[Year], s.TotalGrossIncome
FROM
dbo.InvestmentIncomeStatements AS iis
JOIN
dbo.IssueDates AS dates ON iis.IssueDateID = dates.ID
JOIN
(SELECT
f.InvestmentID, SUM(f.GrossIncome) AS TotalGrossIncome
FROM
(SELECT
stment.InvestmentID, stment.GrossIncome, inment.[Name] as InvestmentName
FROM
dbo.InvestmentIncomeStatements stment
JOIN
dbo.IssueDates AS dts ON stment.IssueDateID = dts.ID
JOIN
dbo.Investments inment ON stment.InvestmentID = inment.ID
WHERE
dts.[Month] + dts.[Year] < '???') AS f
GROUP BY
f.InvestmentID, f.InvestmentName) AS s ON iis.InvestmentID = s.InvestmentID
In the place of the '???' I would like to write 'dates.[Year] + dates.[Month]'.
However I can't refer it. What should i do?
-- WITH POD was causing the issue, removing this code reduced 2 year pull to 3 mins.
Will post new question to figure out best way to include POD data.
--Edit for clarity, I am a read only user to these tables.
I wrote the below query, but it takes a very long time to execute (20min).
It is currently limited to 1 month, but user wants at least 1 year preferably 2. I assume this would scale time to hours.
Can anyone take a look at let me know if there is a BKM I am not using to improve performance?
Or if there is a better method for a report of this size? At 2 years, it would return ~100K rows from 17 tables.
WITH
POD AS
(
SELECT SHIPMENTS.Delivery
,SHIPMENTS.Shipment_Number
,PROOF_OF_DELIVERY.Shipping_Carrier
,PROOF_OF_DELIVERY.Tracking_Number
,PROOF_OF_DELIVERY.Ship_Method
,PROOF_OF_DELIVERY.POD_Signature
,PROOF_OF_DELIVERY.POD_Date
,PROOF_OF_DELIVERY.POD_Time
FROM
SHIPMENTS
LEFT JOIN PROOF_OF_DELIVERY
ON SHIPMENTS.Shipment_Number = PROOF_OF_DELIVERY.Delivery_Or_Shipment
WHERE Load_Date IN
(
SELECT MAX(Load_Date)
FROM PROOF_OF_DELIVERY
GROUP BY Delivery_Or_Shipment
)
)
SELECT DISTINCT GI.GOODS_ISSUE_DOCUMENT_ID
,GI.SALES_ORDER_ID
,GI.SALES_ORDER_LINE_ID
,GI.SALES_ORDER_TYPE_CODE
,GI.DELIVERY_HEADER_ID
,GI.DELIVERY_ITEM_ID
,FD.FISCAL_MONTH_CODE
,GI.MATERIAL_NUMBER
,GI.SHIPPED_QTY
,SO.ORDERER_NAME
,SO.CREATED_BY
,SO.CONTACT_PERSON
,GI.SOLD_TO_CUSTOMER_ID
,GI.SHIP_TO_CUSTOMER_ID
,GI.ORIGINAL_COMMIT_DATE
,GI.SHIP_FROM_PLANT_ID
,GI.ACTUAL_PGI_DATE
,GI.CUSTOMER_PO_NUMBER
,GI.SHIPPED_PRICE
,(GI.SHIPPED_PRICE * GI.SHIPPED_QTY) AS EXT_SHIPPED_PRICE
,GI.SALES_ORGANIZATION_CODE
,GI.DELIVERY_NOTE_PRIORITY_CODE
,FD.FISCAL_WEEK_CODE
,DV.DIVISION_CODE
,DN.Delivery_Item_Creation_Date
,SOLD.CUSTOMER_SHORT_NAME AS SOLD_TO_CUSTOMER_SHORT_NAME
,SHIP.CUSTOMER_SHORT_NAME AS SHIP_TO_CUSTOMER_SHORT_NAME
,SHIP.Customer_Site_Name
,SHIP.REGION_NAME
,MATD.MATERIAL_DESCRIPTION
,MATD.STANDARD_COST
,(MATD.STANDARD_COST * GI.SHIPPED_QTY) AS EXT_STANDARD_COST
,MATD.GLOBAL_EVENT
,PLT.LEAD_TIME_FOR_ORIGINAL_COMMIT
,OPRM.BASE_PART_CODE
,MATD.PRODUCT_INSP_MEMO
,MATD.MATERIAL_PRICING_GROUP_CODE
,MATD.MATERIAL_STATUS AS MMPP
,PIM.PIM_PBG_GROUPING
,SOL.SHIPPING_CONDITION
,SVO.SERVICE_ORDER_NUM
,SO.CREATION_TIME AS SO_CREATION_TIME
,SOL.CREATED_TIME AS SO_LINE_CREATED_TIME
,SOL.SHIPPING_POINT
,SDT.SALES_DOCUMENT_TYPE_CODE AS SVO_DOCUMENT_TYPE_CODE
,EQU.EQUIPMENT_NUM
,EQU.SERIAL_NUMBER
,EQU.CUSTOMERTOOLID
,POD.Shipment_Number
,POD.Shipping_Carrier
,POD.Tracking_Number
,POD.Ship_Method
,POD.POD_Signature
,POD.POD_Date
,POD.POD_Time
,DATEDIFF(dd,SO.CREATION_TIME,GI.ACTUAL_PGI_DATE) AS Cycle_Time_to_PGI_Days
,DATEDIFF(hh,SO.CREATION_TIME,GI.ACTUAL_PGI_DATE) AS Cycle_Time_to_PGI_Hours
FROM GOODS_ISSUE AS GI
INNER JOIN dbo.Delivery_Notes AS DN
ON GI.DELIVERY_HEADER_ID = DN.DELIVERY_HEADER_CODE AND GI.DELIVERY_ITEM_ID = DN.DELIVERY_ITEM_CODE
INNER JOIN dbo.Customer_View AS SOLD
ON GI.SOLD_TO_CUSTOMER_ID = SOLD.CUSTOMER_CODE
INNER JOIN dbo.Customer_View AS SHIP
ON GI.SOLD_TO_CUSTOMER_ID = SHIP.CUSTOMER_CODE
INNER JOIN dbo.MATERIAL_DETAILS AS MATD
ON GI.MATERIAL_NUMBER = MATD.MATERIAL_NUMBER
INNER JOIN dbo.OPR_MATERIAL_DIM AS OPRM
ON OPRM.MATERIAL_NUMBER = GI.MATERIAL_NUMBER
LEFT JOIN dbo.SM_DATE_DIM AS FD
ON CAST(FD.CALENDAR_DAY AS DATE) = CAST(GI.ACTUAL_PGI_DATE AS DATE)
LEFT JOIN dbo.DIM_PUBLISHED_LEAD_TIME_COMMIT AS PLT
ON PLT.MATERIAL_NUMBER = OPRM.BASE_PART_CODE
LEFT JOIN dbo.PRODUCT_INSP_MEMO_DIM AS PIM
ON PIM.PRODUCT_INSP_MEMO = MATD.PRODUCT_INSP_MEMO
INNER JOIN dbo.SM_SALES_ORDER_LINE_FACT AS SOL
ON SOL.SALES_ORDER_CODE = GI.SALES_ORDER_ID AND SOL.SALES_ORDER_LINE_CODE = GI.SALES_ORDER_LINE_ID
INNER JOIN dbo.SM_SALES_ORDER_FACT AS SO
ON SO.SALES_ORDER_CODE = GI.SALES_ORDER_ID
INNER JOIN dbo.SM_DIVISION_DIM AS DV
ON SO.DIVISION_SID = DV.DIVISION_SID
LEFT JOIN dbo.SERVICE_ORDER_FACT AS SVO
ON SVO.SERVICE_ORDER_NUM = SO.SERVICE_ORDER_NUMBER
LEFT JOIN dbo.SM_SALES_DOCUMENT_TYPE_DIM AS SDT
ON SDT.SALES_DOCUMENT_TYPE_SID = SVO.SALES_DOCUMENT_TYPE_SID
LEFT JOIN dbo.SM_EQUIPMENT_DIM AS EQU
ON EQU.EQUIPMENT_SID = SVO.EQUIPMENT_SID
LEFT JOIN POD
ON POD.Delivery = GI.DELIVERY_HEADER_ID
WHERE GI.ACTUAL_PGI_DATE > GETDATE()-32
AND SOLD_TO_CUSTOMER_ID IN (0010000252,0010000898,0010001121,0010001409,0010001842,0010001852,0010001879,0010001977,0010001978,0010002021,0010002202,0010002227,0010002982,0010003118,0010003176,0010003294,0010005492,0010006904,0010007048,0010007080,0010010381,0010010572,0010010905,0010011999,0010012014,0010012048,0010012571,0010013124,0010013711,0010013713,0010013824,0010014180,0010014188,0010014333,0010015059,0010015313,0010015414,0010015541,0010015544,0010015550)
A CTE is just syntax
I suspect that CTE is evaluated many times
Materialze the CTE to #temp with indexe(s) so it is run once
This cast will hurt it
Make those columns true dates and index them
ON CAST(FD.CALENDAR_DAY AS DATE) = CAST(GI.ACTUAL_PGI_DATE AS DATE)
That where negates the left so you can just do a join
Also that MAX(Load_Date) could match on another shipment
SELECT SHIPMENTS.Delivery
,SHIPMENTS.Shipment_Number
,PROOF_OF_DELIVERY.Shipping_Carrier
,PROOF_OF_DELIVERY.Tracking_Number
,PROOF_OF_DELIVERY.Ship_Method
,PROOF_OF_DELIVERY.POD_Signature
,PROOF_OF_DELIVERY.POD_Date
,PROOF_OF_DELIVERY.POD_Time
FROM SHIPMENTS
JOIN PROOF_OF_DELIVERY
ON SHIPMENTS.Shipment_Number = PROOF_OF_DELIVERY.Delivery_Or_Shipment
WHERE PROOF_OF_DELIVERY.Load_Date IN
(
SELECT MAX(Load_Date)
FROM PROOF_OF_DELIVERY
GROUP BY Delivery_Or_Shipment
)
Pull this up into the join
INNER JOIN dbo.Customer_View AS SOLD
ON GI.SOLD_TO_CUSTOMER_ID = SOLD.CUSTOMER_CODE
AND GI.SOLD_TO_CUSTOMER_ID IN (0010000252,0010000898,0010001121,0010001409,0010001842,0010001852,0010001879,0010001977,0010001978,0010002021,0010002202,0010002227,0010002982,0010003118,0010003176,0010003294,0010005492,0010006904,0010007048,0010007080,0010010381,0010010572,0010010905,0010011999,0010012014,0010012048,0010012571,0010013124,0010013711,0010013713,0010013824,0010014180,0010014188,0010014333,0010015059,0010015313,0010015414,0010015541,0010015544,0010015550)
Let me say first that I'm new to SQL, and learning much every day. With that said, here is my problem. I have a view that is already created (It shows revenue generated on equipment), but I need one more table added to it (Expenses against the equipment). When I try to add an inner join table, it create a bunch of duplicate views. Here is my original view (For the revenue portion of it):
SELECT
<removed, there are about 25 of them>
FROM
dbo.LRCON WITH (nolock)
INNER JOIN dbo.LRCONVIN WITH (nolock) ON dbo.LRCONVIN.ConId = dbo.LRCON.ConId
INNER JOIN dbo.LRBILCON WITH (nolock) ON dbo.LRBILCON.ConId = dbo.LRCONVIN.ConId AND dbo.LRBILCON.UntId = dbo.LRCONVIN.UntId
INNER JOIN dbo.LRBILITM WITH (nolock) ON dbo.LRBILITM.ParentItmId = dbo.LRBILCON.ItmId
INNER JOIN dbo.LRBIL WITH (nolock) ON dbo.LRBIL.BilId = dbo.LRBILCON.BilId
INNER JOIN dbo.LRCONTYP WITH (nolock) ON dbo.LRCONTYP.ConTypId = dbo.LRCON.ConTypId
INNER JOIN dbo.COLOOKUP AS C1 WITH (nolock) ON C1.Id = dbo.LRBILITM.ItmTyp
INNER JOIN dbo.COLOOKUP AS C2 WITH (nolock) ON C2.Id = dbo.LRCONTYP.ConTyp
INNER JOIN dbo.VHVIN WITH (nolock) ON dbo.VHVIN.UntId = dbo.LRCONVIN.UntId
WHERE
(dbo.LRBIL.Status = 647) AND (dbo.LRBILITM.ItmTyp <> 274)
I then try to add another join:
INNER JOIN dbo.SVSLS WITH (nolock) on dbo.SVSLS.UntId = dbo.LRCONVIN.UntId
with the select statement:
ROUND(dbo.SVSLS.AmtSubtotal + dbo.SVSLS.AmtSupplies + dbo.SVSLS.AmtDiagnostic + dbo.SVSLS.AmtTax1 + dbo.SVSLS.AmtTax2, 2) AS SvcAmtSale
... but it produces many, many rows of duplicates because it adds the detail of each expense to each row of my original table.
Original table:
https://dl.dropboxusercontent.com/u/81145403/orginal_table.jpg
After I add my new join/select:
https://dl.dropboxusercontent.com/u/81145403/failed_table.jpg
How do I fix this? At the end of the day, I just want to compare my revenue vs expenses on equipment over a date range. I really don't care to have the individual detail of the expenses, just a grand total is fine with me.
Is the SvcSaleAmt the revenue that you are interested in? And are the multiple detail rows separate entries on the same item? If you do not necessarily care about the individual details, you can GROUP the items together. In order to do this, you will need to get rid of the SlsId from your SELECT list, and add
GROUP BY CusId, CusName, BillId, ConId, Prd, ConTypId, ....., AmtCos, AmtGpm
Using all of the columns you have in your SELECT statement. Replace the ROUND() AS SvcSaleAmt with:
ROUND(SUM(dbo.SVSLS.AmtSubtotal + dbo.SVSLS.AmtSupplies + dbo.SVSLS.AmtDiagnostic + dbo.SVSLS.AmtTax1 + dbo.SVSLS.AmtTax2), 2) AS SvcSaleAmt