Improve CASE WHEN Performance - sql

I want to calculate customer retention week over week. My sales_orders table has columns order_date, and customer_name. Basically I want to check if a customer in this week also had an order the previous week. To do this, I have used CASE WHEN and subquery as follows (I have extracted order_week in a cte I've called weekly_customers and gotten distinct customer names within each week):
SELECT wc.order_week,
wc.customer,
CASE
WHEN wc.customer IN (
SELECT sq.customer
FROM weekly_customers sq
WHERE sq.order_week = (wc.order_week - 1))
THEN 'YES'
ELSE 'NO'
END AS present_in_previous_week
from weekly_customers wc
The query returns the correct data. My issue, the table is really huge with about 15000 distinct weekly values. This obviously leads to very long execution time. Is there a way I can improve this loop or even an alternative to the loop altogether?

Something like this:
SELECT
wc.order_week,
wc.customer,
CASE WHEN wcb.customer IS NOT NULL THEN "YES" ELSE "NO" END AS present_in_previous_week
FROM weekly_customers AS wca
LEFT JOIN
weekly_customers AS wcb
ON
wca.customer = wcb.customer
AND wca.order_week - 1 = wcb.order_week
This joins all of the customer data onto the customer data from a week ago. If there is a record for a week ago then wcb.customer will not be null, and we can set the flag to "YES". Otherwise, we set the flag to "NO".

Related

Compare data in the same table

I have a table that stores monthly data and I would like to create a comparison between the quantity movement within a period.
Here is an example of the table
.
The SQL statement below is meant to return any changes that has happened within a period - any fund/policy partial or total loss as well as partial or total gain. I have been battling with it for a while - any help would be well appreciated.
I currently have 5 sets of unions - (where the policies and funds match and there's a difference in quantities held, where the policies exist in the previous and not in the current and vice versa and where the securities exist in the previous and not in the current and vice versa) but the other unions work save for the last couple (where the securities exist in the previous and not in the current and vice versa). It doesn't seem to return every occurrence.
SELECT distinct pc.[Client]
,pc.Policy
,cast(pc.Qty as decimal) AS CurrQ
,0 AS PrevQ
,cast(pc.Qty as decimal) - 0 AS QtyDiff
,CASE WHEN cast(pc.Qty as decimal) - 0 > 0 THEN 'Bought Units'
WHEN cast(pc.Qty as decimal) - 0 < 0 THEN 'Sold Units'
ELSE 'Unknown'
END AS TransactionType
,convert(varchar,cast(pc.[ValDate] as date),103) AS CurrValDate
,'' AS PrevValDate
FROM table pc
WHERE convert(varchar,cast(pc.[ValDate] as date),103) = convert(varchar,getdate(),103)
AND pc.Policy IN (SELECT policy
FROM table
WHERE convert(varchar(10),[ValDate],103) = convert(varchar(10),getdate()-1,103)
AND pc.[Fund] NOT IN (SELECT PM.[Fund]
FROM table pc
LEFT JOIN table pm ON pc.policy = pm.policy
WHERE convert(varchar,cast(pc.[ValDate] as date),103) = convert(varchar,getdate(),103))
AND convert(varchar,cast(pm.[ValDate] as date),103) = convert(varchar,getdate()-1,103))
As #Larnu rightly mentioned in the comment section, the extra conditions in the query changed the run from a LEFT JOIN to an INNER JOIN. I changed the code to have policy, fund and date in the ON clause:
FROM table pc
LEFT JOIN table pm ON (pc.policy = pm.policy
AND pc.fund = pm.fund
AND pc.[ValDate]-1 = pm.[ValDate])
and got rid of the sub queries.
Thanks again Larnu.

Calculated column syntax when using a group by function Teradata

I'm trying to include a column calculated as a % of OTYPE.
IE
Order type | Status | volume of orders at each status | % of all orders at this status
SELECT
T.OTYPE,
STATUS_CD,
COUNT(STATUS_CD) AS STATVOL,
(STATVOL / COUNT(ROW_ID)) * 100
FROM Database.S_ORDER O
LEFT JOIN /* Finding definitions for status codes & attaching */
(
SELECT
ROW_ID AS TYPEJOIN,
"NAME" AS OTYPE
FROM database.S_ORDER_TYPE
) T
ON T.TYPEJOIN = ORDER_TYPE_ID
GROUP BY (T.OTYPE, STATUS_CD)
/*Excludes pending and pending online orders */
WHERE CAST(CREATED AS DATE) = '2018/09/21' AND STATUS_CD <> 'Pending'
AND STATUS_CD <> 'Pending-Online'
ORDER BY T.OTYPE, STATUS_CD DESC
OTYPE STATUS_CD STATVOL TOTALPERC
Add New Service Provisioning 2,740 100
Add New Service In-transit 13 100
Add New Service Error - Provisioning 568 100
Add New Service Error - Integration 1 100
Add New Service Complete 14,387 100
Current output just puts 100 at every line, need it to be a % of total orders
Could anyone help out a Teradata & SQL student?
The complication making this difficult is my understanding of the group by and count syntax is tenuous. It took some fiddling to get it displayed as I have it, I'm not sure how to introduce a calculated column within this combo.
Thanks in advance
There are a couple of places the total could be done, but this is the way I would do it. I also cleaned up your other sub query which was not required, and changed the date to a non-ambiguous format (change it back if it cases an issue in Teradata)
SELECT
T."NAME" as OTYPE,
STATUS_CD,
COUNT(STATUS_CD) AS STATVOL,
COUNT(STATUS_CD)*100/TotalVol as Pct
FROM database.S_ORDER O
LEFT JOIN EDWPRDR_VW40_SBLCPY.S_ORDER_TYPE T on T.ROW_ID = ORDER_TYPE_ID
cross join (select count(*) as TotalVol from database.S_ORDER) Tot
GROUP BY T."NAME", STATUS_CD, TotalVol
WHERE CAST(CREATED AS DATE) = '2018-09-21' AND STATUS_CD <> 'Pending' AND STATUS_CD <> 'Pending-Online'
ORDER BY T."NAME", STATUS_CD DESC
A where clause comes before a group by clause, so the query
shown in the question isn't valid.
Always prefix every column reference with the relevant table alias, below I have assumed that where you did not use the alias that it belongs to the orders table.
You probably do not need a subquery for this left join. While there are times when a subquery is needed or good for performance, this does not appear to be the case here.
Most modern SQL compliant databases provide "window functions", and Teradata does do this. They are extremely useful, and here when you combine count() with an over clause you can get the total of all rows without needing another subquery or join.
Because there is neither sample data nor expected result provided with the question I do not actually know which numbers you really need for your percentage calculation. Instead I have opted to show you different ways to count so that you can choose the right ones. I suspect you are getting 100 for each row because the count(status_cd) is equal to the count(row_id). You need to count status_cd differently to how you count row_id. nb: The count() function increases by 1 for every non-null value
I changed the way your date filter is applied. It is not efficient to change data on every row to suit constants in a where clause. Leave the data untouched and alter the way you apply the filter to suit the data, this is almost always more efficient (search sargable)
SELECT
t.OTYPE
, o.STATUS_CD
, COUNT(o.STATUS_CD) count_status
, COUNT(t.ROW_ID count_row_id
, count(t.row_id) over() count_row_id_over
FROM dbo.S_ORDER o
LEFT JOIN dbo.S_ORDER_TYPE t ON t.TYPEJOIN = o.ORDER_TYPE_ID
/*Excludes pending and pending online orders */
WHERE o.CREATED >= '2018-09-21' AND o.CREATED < '2018-09-22'
AND o.STATUS_CD <> 'Pending'
AND o.STATUS_CD <> 'Pending-Online'
GROUP BY
t.OTYPE
, o.STATUS_CD
ORDER BY
t.OTYPE
, o.STATUS_CD DESC
As #TomC already noted, there's no need for the join to a Derived Table. The simplest way to get the percentage is based on a Group Sum. I also changed the date to an Standard SQL Date Literal and moved the where before group by.
SELECT
t."NAME",
o.STATUS_CD,
Count(o.STATUS_CD) AS STATVOL,
-- rule of thumb: multiply first then divide, otherwise you will get unexpected results
-- (Teradata rounds after each calculation)
100.00 * STATVOL / Sum(STATVOL) Over ()
FROM database.S_ORDER AS O
/* Finding definitions for status codes & attaching */
LEFT JOIN database.S_ORDER_TYPE AS t
ON t.ROW_ID = o.ORDER_TYPE_ID
/*Excludes pending and pending online orders */
-- if o.CREATED is a Timestamp there's no need to apply the CAST
WHERE Cast(o.CREATED AS DATE) = DATE '2018-09-21'
AND o.STATUS_CD NOT IN ('Pending', 'Pending-Online')
GROUP BY (T.OTYPE, o.STATUS_CD)
ORDER BY T.OTYPE, o.STATUS_CD DESC
Btw, you probably don't need an Outer Join, Inner should return the same result.

SQL Query - combine 2 rows into 1 row

I have the following query below (view) in SQL Server. The query produces a result set that is needed to populate a grid. However, a new requirement has come up where the users would like to see data on one row in our app. The tblTasks table can produce 1 or 2 rows. The issue becomes when they're is two rows that have the same job_number but different fldProjectContextId (1 or 31). I need to get the MechApprovalOut and ElecApprovalOut columns on one row instead of two.
I've tried restructuring the query using CTE and over partition and haven't been able to get the necessary results I need.
SELECT TOP (100) PERCENT
CAST(dbo.Job_Control.job_number AS int) AS Job_Number,
dbo.tblTasks.fldSalesOrder, dbo.tblTaskCategories.fldTaskCategoryName,
dbo.Job_Control.Dwg_Sent, dbo.Job_Control.Approval_done,
dbo.Job_Control.fldElecDwgSent, dbo.Job_Control.fldElecApprovalDone,
CASE WHEN DATEDIFF(day, dbo.Job_Control.Dwg_Sent, GETDATE()) > 14
AND dbo.Job_Control.Approval_done IS NULL
AND dbo.tblProjectContext.fldProjectContextID = 1
THEN 1 ELSE 0
END AS MechApprovalOut,
CASE WHEN DATEDIFF(day, dbo.Job_Control.fldElecDwgSent, GETDATE()) > 14
AND dbo.Job_Control.fldElecApprovalDone IS NULL
AND dbo.tblProjectContext.fldProjectContextID = 31
THEN 1 ELSE 0
END AS ElecApprovalOut,
dbo.tblProjectContext.fldProjectContextName,
dbo.tblProjectContext.fldProjectContextId, dbo.Job_Control.Drawing_Info,
dbo.Job_Control.fldElectricalAppDwg
FROM dbo.tblTaskCategories
INNER JOIN dbo.tblTasks
ON dbo.tblTaskCategories.fldTaskCategoryId = dbo.tblTasks.fldTaskCategoryId
INNER JOIN dbo.Job_Control
ON dbo.tblTasks.fldSalesOrder = dbo.Job_Control.job_number
INNER JOIN dbo.tblProjectContext
ON dbo.tblTaskCategories.fldProjectContextId = dbo.tblProjectContext.fldProjectContextId
WHERE (dbo.tblTaskCategories.fldTaskCategoryName = N'Approval'
OR dbo.tblTaskCategories.fldTaskCategoryName = N'Re-Approval')
AND (CASE WHEN DATEDIFF(day, dbo.Job_Control.Dwg_Sent, GETDATE()) > 14
AND dbo.Job_Control.Approval_done IS NULL
AND dbo.tblProjectContext.fldProjectContextID = 1
THEN 1 ELSE 0
END = 1)
OR (dbo.tblTaskCategories.fldTaskCategoryName = N'Approval'
OR dbo.tblTaskCategories.fldTaskCategoryName = N'Re-Approval')
AND (CASE WHEN DATEDIFF(day, dbo.Job_Control.fldElecDwgSent, GETDATE()) > 14
AND dbo.Job_Control.fldElecApprovalDone IS NULL
AND dbo.tblProjectContext.fldProjectContextID = 31
THEN 1 ELSE 0
END = 1)
ORDER BY dbo.Job_Control.job_number, dbo.tblTaskCategories.fldProjectContextId
The above query gives me the following result set:
I've created a work around via code (which I don't like but it works for now) where i've used code to populate a "temp" table the way i need it to display the data, that is, one record if duplicate job numbers to get the MechApprovalOut and ElecApprovalOut columns on one row (see first record in following screen shot).
Example:
With the desired result set and one row per job_number, this is how the form looks with the data and how I am using the result set.
Any help restructuring my query to combine duplicate rows with the same job number where MechApprovalOut and ElecApproval out columns are on one row is greatly appreciated! I'd much prefer to use a view on SQL then code in the app to populate a temp table.
Thanks,
Jimmy
What I would do is LEFT JOIN the main table to itself at the beginning of the query, matching on Job Number and Sales Order, such that the left side of the join is only looking at Approval task categories and the right side of the join is only looking at Re-Approval task categories. Then I would make extensive use of the COALESCE() function to select data from the correct side of the join for use later on and in the select clause. This may also be the piece you were missing to make a CTE work.
There is probably also a solution that uses a ranking/windowing function (maybe not RANK itself, but something that category) along with the PARTITION BY clause. However, as those are fairly new to Sql Server I haven't used them enough personally to be comfortable writing an example solution for you without direct access to the data to play with, and it would still take me a little more time to get right than I can devote to this right now. Maybe this paragraph will motivate someone else to do that work.

SQL - 2 table values to be grouped by third unconnected value

I want to create a graph that pulls data from 2 user questions generated from within an SQL database.
The issue is that the user questions are stored in the same table, as are the answers. The only connection is that the question string includes a year value, which I extract using the LEFT command so that I output a column called 'YEAR' with a list of integer values running from 2013 to 2038 (25 year period).
I then want to pull the corresponding answers ('forecast' and 'actual') from each 'YEAR' so that I can plot a graph with a couple of values from each year (sorry if this isn't making any sense). The graph should show a forecast line covering the 25 year period with a second line (or column) showing the actual value as it gets populated over the years. I'll then be able to visualise if our actual value is close to our original forecast figures (long term goal!)
CODE BELOW
SELECT CAST((LEFT(F_TASK_ANS.TA_ANS_QUESTION,4)) AS INTEGER) AS YEAR,
-- first select takes left 4 characters of question and outputs value as string then coverts value to whole number.
CAST((CASE WHEN F_TASK_ANS.TA_ANS_QUESTION LIKE '%forecast' THEN F_TASK_ANS.TA_ANS_ANSWER END) AS NUMERIC(9,2)) AS 'FORECAST',
CAST((CASE WHEN F_TASK_ANS.TA_ANS_QUESTION LIKE '%actual' THEN ISNULL(F_TASK_ANS.TA_ANS_ANSWER,0) END) AS NUMERIC(9,2)) AS 'ACTUAL'
-- actual value will be null until filled in each year therefore ISNULL added to replace null with 0.00.
FROM F_TASK_ANS INNER JOIN F_TASKS ON F_TASK_ANS.TA_ANS_FKEY_TA_SEQ = F_TASKS.TA_SEQ
WHERE TA_ANS_ANSWER <> ''
AND (TA_TASK_ID LIKE '%6051' OR TA_TASK_ID LIKE '%6052')
-- The two numbers above refer to separate PPM questions that the user enters a value into
I tried GROUP BY 'YEAR' but I get an
Error: Each GROUP BY expression must contain at least one column that
is not an outer reference - which I assume is because I haven't linked
the 2 tables in any way...
Should I be adding a UNION so the tables are joined?
What I want to see is something like the following output (which I'll graph up later)
YEAR FORECAST ACTUAL
2013 135000 127331
2014 143000 145102
2015 149000 0
2016 158000 0
2017 161000 0
2018... etc
Any help or guidance would be hugely appreciated.
Thanks
Although the syntax is pretty hairy, this seems like a fairly simple query. You are in fact linking your two tables (with the JOIN statement) and you don't need a UNION.
Try something like this (using a common table expression, or CTE, to make the grouping clearer, and changing the syntax for slightly greater clarity):
WITH data
AS (
SELECT YEAR = CAST((LEFT(A.TA_ANS_QUESTION,4)) AS INTEGER)
, FORECAST = CASE WHEN A.TA_ANS_QUESTION LIKE '%forecast'
THEN CONVERT(NUMERIC(9,2), A.TA_ANS_ANSWER)
ELSE CONVERT(NUMERIC(9,2), 0)
END
, ACTUAL = CASE WHEN A.TA_ANS_QUESTION LIKE '%actual'
THEN CONVERT(NUMERIC(9,2), ISNULL(A.TA_ANS_ANSWER,0) )
ELSE CONVERT(NUMERIC(9,2), 0)
END
FROM F_TASK_ANS A
INNER JOIN F_TASKS T
ON A.TA_ANS_FKEY_TA_SEQ = T.TA_SEQ
-- It sounded like you wanted to include the ones where the answer was null. If
-- that's wrong, get rid of the test for NULL.
WHERE (A.TA_ANS_ANSWER <> '' OR A.TA_ANS_ANSWER IS NULL)
AND (TA_TASK_ID LIKE '%6051' OR TA_TASK_ID LIKE '%6052')
)
SELECT YEAR
, FORECAST = SUM(data.Forecast)
, ACTUAL = SUM(data.Actual)
FROM data
GROUP BY YEAR
ORDER BY YEAR
Try something like this ...
SELECT CAST((LEFT(F_TASK_ANS.TA_ANS_QUESTION,4)) AS INT) AS [YEAR]
,SUM(CAST((CASE WHEN F_TASK_ANS.TA_ANS_QUESTION LIKE '%forecast'
THEN F_TASK_ANS.TA_ANS_ANSWER ELSE 0 END) AS NUMERIC(9,2))) AS [FORECAST]
,SUM(CAST((CASE WHEN F_TASK_ANS.TA_ANS_QUESTION LIKE '%actual'
THEN F_TASK_ANS.TA_ANS_ANSWER ELSE 0 END) AS NUMERIC(9,2))) AS [ACTUAL]
FROM F_TASK_ANS INNER JOIN F_TASKS
ON F_TASK_ANS.TA_ANS_FKEY_TA_SEQ = F_TASKS.TA_SEQ
WHERE TA_ANS_ANSWER <> ''
AND (TA_TASK_ID LIKE '%6051' OR TA_TASK_ID LIKE '%6052')
GROUP BY CAST((LEFT(F_TASK_ANS.TA_ANS_QUESTION,4)) AS INT)

SQL SUM function doubling the amount it should using multiple tables

My query below is doubling the amount on the last record it returns. I have 3 tables - activities, bookings and tempbookings. The query needs to list the activities and attached information and pull the total number (using the SUM) of places booked (as BookingTotal) from the booking table by each activity and then it needs to calculate the same for tempbookings (as tempPlacesReserved) providing the reservedate field inside that table is in the future.
However the first issue is that if there are no records for an activity in the tempbookings table it does not return any records for that activity at all, to get around this i created dummy records in the past so that it still returns the record, but if I can make it so I don't have to do this I would prefer it!
The main issue I have is that on the final record of the returned results it doubles the booking total and the places reserved which of course makes the whole query useless.
I know that I am doing something wrong I just haven't been able to sort it, I have searched similar issues online but am unable to apply them to my situation correctly.
Any help would be appreciated.
P.S. I'm aware that normally you wouldn't need to fully label all the paths to the databases, tables and fields as I have but for the program I am planning to use it in I have to do it this way.
Code:
SELECT [LeisureActivities].[dbo].[activities].[activityID],
[LeisureActivities].[dbo].[activities].[activityName],
[LeisureActivities].[dbo].[activities].[activityDate],
[LeisureActivities].[dbo].[activities].[activityPlaces],
[LeisureActivities].[dbo].[activities].[activityPrice],
SUM([LeisureActivities].[dbo].[bookings].[bookingPlaces]) AS 'bookingTotal',
SUM (CASE WHEN[LeisureActivities].[dbo].[tempbookings].[tempReserveDate] > GetDate() THEN [LeisureActivities].[dbo].[tempbookings].[tempPlaces] ELSE 0 end) AS 'tempPlacesReserved'
FROM [LeisureActivities].[dbo].[activities],
[LeisureActivities].[dbo].[bookings],
[LeisureActivities].[dbo].[tempbookings]
WHERE ([LeisureActivities].[dbo].[activities].[activityID]=[LeisureActivities].[dbo].[bookings].[activityID]
AND [LeisureActivities].[dbo].[activities].[activityID]=[LeisureActivities].[dbo].[tempbookings].[tempActivityID])
AND [LeisureActivities].[dbo].[activities].[activityDate] > GetDate ()
GROUP BY [LeisureActivities].[dbo].[activities].[activityID],
[LeisureActivities].[dbo].[activities].[activityName],
[LeisureActivities].[dbo].[activities].[activityDate],
[LeisureActivities].[dbo].[activities].[activityPlaces],
[LeisureActivities].[dbo].[activities].[activityPrice];
Your current query is using an INNER JOIN between each of the tables so if the tempBookings table has no records, you will not return anything.
I would advise that you start to use JOIN syntax. You might also need to use subqueries to get the totals.
SELECT a.[activityID],
a.[activityName],
a.[activityDate],
a.[activityPlaces],
a.[activityPrice],
coalesce(b.bookingTotal, 0) bookingTotal,
coalesce(t.tempPlacesReserved, 0) tempPlacesReserved
FROM [LeisureActivities].[dbo].[activities] a
LEFT JOIN
(
select activityID,
SUM([bookingPlaces]) AS bookingTotal
from [LeisureActivities].[dbo].[bookings]
group by activityID
) b
ON a.[activityID]=b.[activityID]
LEFT JOIN
(
select tempActivityID,
SUM(CASE WHEN [tempReserveDate] > GetDate() THEN [tempPlaces] ELSE 0 end) AS tempPlacesReserved
from [LeisureActivities].[dbo].[tempbookings]
group by tempActivityID
) t
ON a.[activityID]=t.[tempActivityID]
WHERE a.[activityDate] > GetDate();
Note: I am using aliases because it is easier to read
Use new SQL-92 Join syntax, and make join to tempBookings an outer join. Also clean up your sql with table aliases. Makes it easier to read. As to why last row has doubled values, I don't know, but on off chance that it is caused by extra dummy records you entered. get rid of them. That problem is fixed by using outer join to tempBookings. The other possibility is that the join conditions you had to the tempBookings table(t.tempActivityID = a.activityID) is insufficient to guarantee that it will match to only one record in activities table... If, for example, it matches to two records in activities, then the rows from Tempbookings would be repeated twice in the output, (causing the sum to be doubled)
SELECT a.activityID, a.activityName, a.activityDate,
a.activityPlaces, a.activityPrice,
SUM(b.bookingPlaces) bookingTotal,
SUM (CASE WHEN t.tempReserveDate > GetDate()
THEN t.tempPlaces ELSE 0 end) tempPlacesReserved
FROM LeisureActivities.dbo.activities a
Join LeisureActivities.dbo.bookings b
On b.activityID = a.activityID
Left Join LeisureActivities.dbo.tempbookings t
On t.tempActivityID = a.activityID
WHERE a.activityDate > GetDate ()
GROUP BY a.activityID, a.activityName,
a.activityDate, a.activityPlaces,
a.activityPrice;