SQL Using WITH and JOIN together to create a view efficiently - sql

I'm trying to use data from one table named period which specifies what period a date falls into, then using that instance to join into another table using the following statement.
WITH rep_prod AS (
SELECT t.tran_num, t.amount, t.provider_id, t.clinic,
t.tran_date, t.type, t.impacts, p.period_id, p.fiscal_year, p.period_weeks
FROM transactions t, period p
WHERE tran_date BETWEEN period_start AND period_end
)
SELECT r.tran_num, r.amount, r.provider_id, d.first_name, d.last_name,
d.clinic, r.tran_date, r.period_id, r.period_weeks, r.type, r.impacts
FROM rep_prod AS r
INNER JOIN provider AS d
ON r.provider_id = d.provider_id AND r.clinic = d.clinic
Looking to create this as a view on my DB, is there a more efficient way to accomplish this? This is currently holding around 6.2 million rows and it will only continue to get bigger. This query alone took over 7 minutes to complete, granted I'm using SQL Express with the memory limitations.
Update: Change query to reflect the removal of the SELECT DISTINCT function
EDIT: #Rabbit So you're suggesting something like this?
SELECT t.tran_num, t.amount, t.provider_id, d.first_name, d.last_name,
d.clinic, t.tran_date, p.period_id, p.period_weeks, p.fiscal_year, p.period_start, p.period_end, t.type, t.impacts
FROM transactions t
INNER JOIN provider d
ON provider.provider_id = transactions.provider_id AND provider.clinic = transactions.clinic
INNER JOIN period p
ON t.tran_date BETWEEN p.period_start AND p.period_end

Related

LEFT JOIN not keeping only records that occur in a SELECT query

I have the following SQL select statement that I use to get a subset of products, or wines:
SELECT pv.SkProdVariantId AS id,
pa.Colour AS colour,
FROM Dim.ProductVariant AS pv
JOIN ProductAttributes_new AS pa
ON pv.SkProdVariantId = pa.SkProdVariantId
WHERE pv.ProdTypeName = 'Wines'
The length of this table generated is 3,905. I want to get all the transactional data for these products.
At the moment I'm using this select statement
SELECT c.CalDate AS timestamp,
f.SkProductVariantId AS sku_id,
f.Quantity AS quantity
FROM fact.FTransactions AS f
LEFT JOIN Dim.Calendar AS c
ON f.SkDateId = c.SkDateId
LEFT JOIN (
SELECT pv.SkProdVariantId AS id,
pa.Colour AS colour,
FROM Dim.ProductVariant AS pv
JOIN ProductAttributes_new AS pa
ON pv.SkProdVariantId = pa.SkProdVariantId
WHERE pv.ProdTypeName = 'Wines'
) AS s
ON s.id = f.SkProductVariantId
WHERE c.CalDate LIKE '%2019%'
The calendar dates are correct, but the number of unique products returned is 5,648, rather than the expected 3,905 from the select query.
Why does my LEFT JOIN on the first select query not work as I expect it to, please?
Thanks for any help!
If you want all the rows form your query, it needs to be the first reference in the LEFT JOIN. Then, I am guessing that you want transaction in 2019:
select . . .
from (SELECT pv.SkProdVariantId AS id, pa.Colour AS colour,
FROM Dim.ProductVariant pv JOIN
ProductAttributes_new pa
ON pv.SkProdVariantId = pa.SkProdVariantId
WHERE pv.ProdTypeName = 'Wines'
) s LEFT JOIN
(fact.FTransactions f JOIN
Dim.Calendar c
ON f.SkDateId = c.SkDateId AND
c.CalDate >= '2019-01-01' AND
c.CalDate < '2020-01-01'
)
ON s.id = f.SkProductVariantId;
Note that this assumes that CalDate is really a date and not a string. LIKE should only be used on strings.
You misunderstand somehow how outer joins work. See Gordon's answer and my request comment on that.
As to the task: It seems you want to select transactions of 2019, but you want to restrict your results to wine products. We typically restrict query results in the WHERE clause. You can use IN or EXISTS for that.
SELECT
c.CalDate AS timestamp,
f.SkProductVariantId AS sku_id,
f.Quantity AS quantity
FROM fact.FTransactions AS f
INNER JOIN Dim.Calendar AS c ON f.SkDateId = c.SkDateId
WHERE DATEPART(YEAR, c.CalDate) = 2019
AND f.SkProductVariantId IN
(
SELECT pv.SkProdVariantId
FROM Dim.ProductVariant AS pv
WHERE pv.ProdTypeName = 'Wines'
);
(I've removed the join to ProductAttributes_new, because it doesn't seem to play any part in this query.)

Converting a SQL subquery to a join for performance gains

I have a subquery with an inner join, This join is meant to cut down the data size to a more manageable size, before extracting data via an unpivot which has a further join to only pull out relevant matches..
When i've looked at the execution plan, it seems like the outer select is being executed first and thus taking an inordinate amount of time to complete as it is processing the data for all gamers instead of the cut down cohort..
this is the query
SELECT
t2.Gamer_ID,
C.Feature_Code,
C.Feature_Name,
t2.CODE_DATE
FROM
(
SELECT
CAST(A.Gamer_ID
,[Identification_Code_Code_1]
,[Identification_Code_Code_2]
,[Identification_Code_Code_3]
,[Identification_Code_Code_4]
,[Identification_Code_Code_5]
,[Identification_Code_Code_6]
,[Identification_Code_Code_7]
,[Identification_Code_Code_8]
,[Identification_Code_Code_9]
,[Identification_Code_Code_10]
,CAST(Joining_date AS DATE) AS CODE_DATE
FROM Gamer_Characteristics A
INNER JOIN Gamer_Population P ON P.Gamer_ID = A.Gamer_ID --cuts down the number of gamers to the selected cohort
) s
unpivot (CODE for col in (
[Identification_Code_Code_1]
,[Identification_Code_Code_2]
,[Identification_Code_Code_3]
,[Identification_Code_Code_4]
,[Identification_Code_Code_5]
,[Identification_Code_Code_6]
,[Identification_Code_Code_7]
,[Identification_Code_Code_8]
,[Identification_Code_Code_9]
,[Identification_Code_Code_10])) as t2
INNER JOIN Gamer_feature_Code C ON C.CODE = LEFT(t2.CODE,C.CODE_LENGTH) --join to a dimension table to pull through characteristcs based on code and code length
WHERE
T2.CODE_DATE <= '2020-03-31'
GROUP BY t2.Gamer_ID,
C.Feature_Code,
C.Feature_Name,
t2.CODE_DATE
I have two questions.
1: can this be converted to use a join instead of subquery
2: Can i force the inner join in the subquery to take precedence over the inner join in the outer select?

SQL - Subtraction within a Common Table Expression,

------ SOLVED --------
Instead of trying to perform the subtraction within the scope of the CTE, I just had to place it into the sub query which was using this particular CTE, in the main select list.
The Problem was, `InnerOQLI.FreePlaceCount is not being recognized, since it hasn't been defined within the scope of the CTE and is only being used in the Exists statement.
------ PROBLEM ----------
This is the first time I've used CTE's, I've joined multiple CTE's together so that I can retrieve an overall total in a single column.
I need to perform a subtraction within one of the CTE's
I first wrote it like this
MyCount2
AS
(
SELECT DISTINCT
O.ID AS OrderID,
(
(
(SELECT SUM(InnerOC.[Count])
FROM Order InnerO
INNER JOIN SubOrder InnerSO ON InnerO.ID = InnerSO.OrderID
INNER JOIN OrderComponent InnerOC ON SO.ID = OC.SubOrderID
WHERE OC.OrderComponentTypeID IN (1,2,4,5)
AND EXISTS (SELECT * FROM OrderQuoteLineItem InnerOQLI
WHERE InnerOQLI.OrderQuoteLineItemTypeID = 9 AND Order.ID = InnerO.ID)
AND Inner0.ID = ).ID)
)
- --< Minus Here
OQLI.FreePlaceCount
) AS [SHPCommExpression2]
FROM Order O
INNER JOIN SubOrder SO ON O.ID = SO.OrderID
INNER JOIN OrderComponent OC ON SO.ID = OC.SubOrderID
INNER JOIN OrderQuoteLineItem OQLI ON SO.ID = 0QLI.SubOrderID
),
Without going into to much detail, this brings back incorrect data because of repeated rows in the main query. (I believe it cos of the same joins within the main query)
So I then wrote this
MyCount2
AS
(SELECT InnerO.ID AS OrderID
SUM(InnerOC.[Count]
- InnerOQLI.FreePlaceCount) --- Tried to place subtraction here ----
AS [SHPCommExpression12])
FROM Order InnerO
INNER JOIN SubOrder InnerSO ON InnerO.ID = InnerSO.OrderID
INNER JOIN OrderComponent InnerOC ON SO.ID = OC.SubOrderID
WHERE OC.OrderComponentTypeID IN (1,2,4,5)
AND EXISTS (SELECT * FROM OrderQuoteLineItem InnerOQLI
WHERE InnerOQLI.OrderQuoteLineItemTypeID = 9 AND Order.ID = InnerO.ID)
GROUP BY InnerO.ID)
),
You can see where I've attempted to perform the subtraction, but It doesn't recognize InnerOQLI, where I've tried to add it to perform the subtraction. I can't work out how to correct this, I realize that it cant fully recognize the InnerOQLI since it's in the Exists statement, Is there away around this? If anyone could help I'd appreciate it
Thanks
Instead of trying to perform the subtraction within the scope of the CTE, I just had to place it into the sub query which was using this particular CTE, in the main select list.
The Problem was, `InnerOQLI.FreePlaceCount is not being recognized, since it hasn't been defined within the scope of the CTE and is only being used in the Exists statement.

oracle sql - missing dates from range

I've created the following script ...
SELECT
gr.RESERVATION_NO,
gr.TITLE,
gr.CATNR,
gl.DUEDATE,
gr.CRE_USR,
gl.QTY,
gl.WORK_CENTER_NO,
gl.TEC_CRITERIA,
gr.RESERVE_QTY,
gl.PLANT,
studate.dt
FROM GPS_RESERVATION gr,
(Select first_date + Level-1 dt
From
(
Select trunc(sysdate) first_date,
trunc(sysdate)+60 last_date
from dual
)
Connect By Level <= (last_date - first_date) +1 ) studate
INNER JOIN GPS_RESERVATION_LOAD gl
ON gl.work_center_no = 'ALIN'
AND gl.duedate = studate.dt
AND gl.plant = 'W'
WHERE gr.RESERVATION_NO = gl.RESERVATION_NO
AND gr.ACTIVE_FLAG = 'Y'
AND gr.reservation_no = '176601'
ORDER BY
gl.DUEDATE
I expected to see ALL DATES from sysdate to sysdate+60 but, I only get dates where duedate exists.
i.e.
I get...
I expected...
What am I doing wrong please ?
Thanks for your help.
You're mixing older Oracle join syntax with newer ANSI join syntax which is a bit confusing, and might trip up the optimiser; but the main problem is that you have an inner join between your generated date list and your gl table; and you then also have a join condition in the where clause which keeps it as an inner join even if you change the join type.
Without the table structures or any data, I think you want:
...
FROM (
Select first_date + Level-1 dt
From
(
Select trunc(sysdate) first_date,
trunc(sysdate)+60 last_date
from dual
)
Connect By Level <= (last_date - first_date) +1
) studate
CROSS JOIN GPS_RESERVATION gr
LEFT OUTER JOIN GPS_RESERVATION_LOAD gl
ON gl.work_center_no = 'ALIN'
AND gl.duedate = studate.dt
AND gl.plant = 'W'
AND gl.RESERVATION_NO = gr.RESERVATION_NO
WHERE gr.ACTIVE_FLAG = 'Y'
AND gr.reservation_no = '176601'
ORDER BY
gl.DUEDATE
The cross-join gives you the cartesian product of the generated dates and the matching records in gr; so if your gr filter finds 5 rows, you'll have 300 results from that join. The left outer join then looks for any matching rows in gl, with all the filter/join conditions related to gl within that on clause.
You should look at the execution plans for your query and this one, firstly to see the difference, but more importantly to check it is joining and filtering as you expect and in a sensible and cost-effective way. And check the results are correct, of course... You might also want to look at a version that uses a left outer join but keeps your original where clause, and see that that makes it go back to an inner join.

SQL query that uses a GROUP BY and IN is too slow

I am struggling to speed this SQL query up. I have tried removing all the fields besides the two SUM() functions and the Id field but it is still incredibly slow. It is currently taking 15 seconds to run. Does anyone have any suggestions to speed this up as it is currently causing a timeout on a page in my web app. I need the fields shown so I can't really remove them but there surely has to be a way to improve this?
SELECT [Customer].[iCustomerID],
[Customer].[sCustomerSageCode],
[Customer].[sCustomerName],
[Customer].[sCustomerTelNo1],
SUM([InvoiceItem].[fQtyOrdered]) AS [Quantity],
SUM([InvoiceItem].[fNetAmount]) AS [Value]
FROM [dbo].[Customer]
LEFT JOIN [dbo].[CustomerAccountStatus] ON ([Customer].[iAccountStatusID] = [CustomerAccountStatus].[iAccountStatusID])
LEFT JOIN [dbo].[SalesOrder] ON ([SalesOrder].[iCustomerID] = [dbo].[Customer].[iCustomerID])
LEFT JOIN [Invoice] ON ([Invoice].[iCustomerID] = [Customer].[iCustomerID])
LEFT JOIN [dbo].[InvoiceItem] ON ([Invoice].[iInvoiceNumber] = [InvoiceItem].[iInvoiceNumber])
WHERE ([InvoiceItem].[sNominalCode] IN ('4000', '4001', '4002', '4004', '4005', '4006', '4007', '4010', '4015', '4016', '700000', '701001', '701002', '701003'))
AND( ([dbo].[SalesOrder].[dOrderDateTime] >= '2013-01-01')
OR ([dbo].[Customer].[dDateCreated] >= '2014-01-01'))
GROUP BY [Customer].[iCustomerID],[Customer].[sCustomerSageCode],[Customer].[sCustomerName], [Customer].[sCustomerTelNo1];
I don't think this query is doing what you want anyway. As written, there are no relationships between the Invoice table and the SalesOrder table. This leads me to believe that it is producing a cartesian product between invoices and orders, so customers with lots of orders would be generating lots of unnecessary intermediate rows.
You can test this by removing the SalesOrder table from the query:
SELECT c.[iCustomerID], c.[sCustomerSageCode], c.[sCustomerName], c.[sCustomerTelNo1],
SUM(it.[fQtyOrdered]) AS [Quantity], SUM(it.[fNetAmount]) AS [Value]
FROM [dbo].[Customer] c LEFT JOIN
[dbo].[CustomerAccountStatus] cas
ON c.[iAccountStatusID] = cas.[iAccountStatusID] LEFT JOIN
[Invoice] i
ON (i.[iCustomerID] = c.[iCustomerID]) LEFT JOIN
[dbo].[InvoiceItem] it
ON (i.[iInvoiceNumber] = it.[iInvoiceNumber])
WHERE it.[sNominalCode] IN ('4000', '4001', '4002', '4004', '4005', '4006', '4007', '4010', '4015', '4016', '700000', '701001', '701002', '701003') AND
c.[dDateCreated] >= '2014-01-01'
GROUP BY c.[iCustomerID], c.[sCustomerSageCode], c.[sCustomerName], c.[sCustomerTelNo1];
If this works and you need the SalesOrder, then you will need to either pre-aggregate by SalesOrder or find better join keys.
The above query could benefit from an index on Customer(dDateCreated, CustomerId).
You have a lot of LEFT JOIN
I don't see CustomerAccountStatus usage. Ou can exclude it
The [InvoiceItem].[sNominalCode] could be null in case of LEFT JOIN so add [InvoiceItem].[sNominalCode] is not null or <THE IN CONDITION>
Also add the is not null checks to other conditions
It seems you are looking for customers that are either created this year or for which sales orders exist from last year or this year. So select from customers, but use EXISTS on SalesOrder. Then you want to count invoices. So outer join them and make sure to have the criteria in the ON clause. (sNominalCode will be NULL for any outer joined records. Hence asking for certain sNominalCode in the WHERE clause will turn your outer join into an inner join.)
SELECT
c.iCustomerID,
c.sCustomerSageCode,
c.sCustomerName,
c.sCustomerTelNo1,
SUM(ii.fQtyOrdered) AS Quantity,
SUM(ii.fNetAmount) AS Value
FROM dbo.Customer c
LEFT JOIN dbo.Invoice i ON (i.iCustomerID = c.iCustomerID)
LEFT JOIN dbo.InvoiceItem ii ON (ii.iInvoiceNumber = i.iInvoiceNumber AND ii.sNominalCode IN ('4000', '4001', '4002', '4004', '4005', '4006', '4007', '4010', '4015', '4016', '700000', '701001', '701002', '701003'))
WHERE c.dDateCreated >= '2014-01-01'
OR EXISTS
(
SELECT *
FROM dbo.SalesOrder
WHERE iCustomerID = c.iCustomerID
AND dOrderDateTime >= '2013-01-01'
)
GROUP BY c.iCustomerID, c.sCustomerSageCode, c.sCustomerName, c.sCustomerTelNo1;