Compare data in the same table - sql

I have a table that stores monthly data and I would like to create a comparison between the quantity movement within a period.
Here is an example of the table
.
The SQL statement below is meant to return any changes that has happened within a period - any fund/policy partial or total loss as well as partial or total gain. I have been battling with it for a while - any help would be well appreciated.
I currently have 5 sets of unions - (where the policies and funds match and there's a difference in quantities held, where the policies exist in the previous and not in the current and vice versa and where the securities exist in the previous and not in the current and vice versa) but the other unions work save for the last couple (where the securities exist in the previous and not in the current and vice versa). It doesn't seem to return every occurrence.
SELECT distinct pc.[Client]
,pc.Policy
,cast(pc.Qty as decimal) AS CurrQ
,0 AS PrevQ
,cast(pc.Qty as decimal) - 0 AS QtyDiff
,CASE WHEN cast(pc.Qty as decimal) - 0 > 0 THEN 'Bought Units'
WHEN cast(pc.Qty as decimal) - 0 < 0 THEN 'Sold Units'
ELSE 'Unknown'
END AS TransactionType
,convert(varchar,cast(pc.[ValDate] as date),103) AS CurrValDate
,'' AS PrevValDate
FROM table pc
WHERE convert(varchar,cast(pc.[ValDate] as date),103) = convert(varchar,getdate(),103)
AND pc.Policy IN (SELECT policy
FROM table
WHERE convert(varchar(10),[ValDate],103) = convert(varchar(10),getdate()-1,103)
AND pc.[Fund] NOT IN (SELECT PM.[Fund]
FROM table pc
LEFT JOIN table pm ON pc.policy = pm.policy
WHERE convert(varchar,cast(pc.[ValDate] as date),103) = convert(varchar,getdate(),103))
AND convert(varchar,cast(pm.[ValDate] as date),103) = convert(varchar,getdate()-1,103))

As #Larnu rightly mentioned in the comment section, the extra conditions in the query changed the run from a LEFT JOIN to an INNER JOIN. I changed the code to have policy, fund and date in the ON clause:
FROM table pc
LEFT JOIN table pm ON (pc.policy = pm.policy
AND pc.fund = pm.fund
AND pc.[ValDate]-1 = pm.[ValDate])
and got rid of the sub queries.
Thanks again Larnu.

Related

Improve CASE WHEN Performance

I want to calculate customer retention week over week. My sales_orders table has columns order_date, and customer_name. Basically I want to check if a customer in this week also had an order the previous week. To do this, I have used CASE WHEN and subquery as follows (I have extracted order_week in a cte I've called weekly_customers and gotten distinct customer names within each week):
SELECT wc.order_week,
wc.customer,
CASE
WHEN wc.customer IN (
SELECT sq.customer
FROM weekly_customers sq
WHERE sq.order_week = (wc.order_week - 1))
THEN 'YES'
ELSE 'NO'
END AS present_in_previous_week
from weekly_customers wc
The query returns the correct data. My issue, the table is really huge with about 15000 distinct weekly values. This obviously leads to very long execution time. Is there a way I can improve this loop or even an alternative to the loop altogether?
Something like this:
SELECT
wc.order_week,
wc.customer,
CASE WHEN wcb.customer IS NOT NULL THEN "YES" ELSE "NO" END AS present_in_previous_week
FROM weekly_customers AS wca
LEFT JOIN
weekly_customers AS wcb
ON
wca.customer = wcb.customer
AND wca.order_week - 1 = wcb.order_week
This joins all of the customer data onto the customer data from a week ago. If there is a record for a week ago then wcb.customer will not be null, and we can set the flag to "YES". Otherwise, we set the flag to "NO".

Divide by zero error when using >, < or <> in Where statement. No division operator involved

I have two CTEs using the same table which holds receipts. Receipt type "a" says how much is billed and may or may not have an amount received. Receipt type "b"s have how much was received if it was received outside of the original receipt. These are matched up by mnth, cusnbr and job. Also on the receipt is how much is allocated for different expenses on the receipt.
I am trying to total up peoples hours if the receipt has been paid at least 99%. These records are based on cusnbr, jobnbr and mnth also. The code below works fine.
with billed as(Select cusnbr
,job
,mnth
,sum(bill_item_1) as 'Billed Item'
,sum(billed) as 'Billed'
From accounting
Where mytype in ('a','b')
Group by cusnbr
,job
,mnth)
paid as(Select cusnbr
,job
,mnth
,sum(rcpt_item_1) as 'Rcpt Item'
,sum(billed) as 'Paid'
From accounting
Where mytype in ('a','b')
Group by cusnbr
,job
,mnth)
Select b.cusnbr
,b.job
,b.mnth
,sum(g.hours) as 'Total Hours'
,b.[Billed Item]
,p.[Rcpt Item]
From billed b inner join paid p
on b.cusnbr = p.cusnbr
and b.job = p.job
and b.mnth = p.mnth
inner join guys g
on b.cusnbr = g.cusnbr
and b.job = g.job
and b.mnth = g.mnth
Where p.[Paid]/b.Billed > .99
The issue I'm having is if I try to add
and b.[Billed Item] <> 0
To the where clause.
I get "Divide by zero error encountered"
I have tried making the last query a CTE with
case when b.[Billed Item] = 0 then 1 else 0 end as flag
and then making another query which checks that flag <> 0
I have tried using isnull(b.[Billed Item],0) in the last query as well as isnull(bill_item_1,0) in the first CTE.
I can get around this issue by dumping the whole thing into a temp table and querying that, but just want to know why this is happening. Using ">","<" or "<>" against b.[Billed Item] results in a divide by zero error.
Use nullif():
Where p.[Paid] / nullif(b.Billed, 0) > 0.99
This will return null -- which does not meet the condition. You can also phrase this more simply without division as:
where p.paid > b.billed * 0.99
I can't answer you question specifically, but I can tell you that SQL does not process all commands if it does not need to. For example,
SELECT COUNT(1/0)
Happily returns 1. So it is quite possible the order of the conditions cause the optimizer to filter out an uncessary division by zero condition.

Calculated column syntax when using a group by function Teradata

I'm trying to include a column calculated as a % of OTYPE.
IE
Order type | Status | volume of orders at each status | % of all orders at this status
SELECT
T.OTYPE,
STATUS_CD,
COUNT(STATUS_CD) AS STATVOL,
(STATVOL / COUNT(ROW_ID)) * 100
FROM Database.S_ORDER O
LEFT JOIN /* Finding definitions for status codes & attaching */
(
SELECT
ROW_ID AS TYPEJOIN,
"NAME" AS OTYPE
FROM database.S_ORDER_TYPE
) T
ON T.TYPEJOIN = ORDER_TYPE_ID
GROUP BY (T.OTYPE, STATUS_CD)
/*Excludes pending and pending online orders */
WHERE CAST(CREATED AS DATE) = '2018/09/21' AND STATUS_CD <> 'Pending'
AND STATUS_CD <> 'Pending-Online'
ORDER BY T.OTYPE, STATUS_CD DESC
OTYPE STATUS_CD STATVOL TOTALPERC
Add New Service Provisioning 2,740 100
Add New Service In-transit 13 100
Add New Service Error - Provisioning 568 100
Add New Service Error - Integration 1 100
Add New Service Complete 14,387 100
Current output just puts 100 at every line, need it to be a % of total orders
Could anyone help out a Teradata & SQL student?
The complication making this difficult is my understanding of the group by and count syntax is tenuous. It took some fiddling to get it displayed as I have it, I'm not sure how to introduce a calculated column within this combo.
Thanks in advance
There are a couple of places the total could be done, but this is the way I would do it. I also cleaned up your other sub query which was not required, and changed the date to a non-ambiguous format (change it back if it cases an issue in Teradata)
SELECT
T."NAME" as OTYPE,
STATUS_CD,
COUNT(STATUS_CD) AS STATVOL,
COUNT(STATUS_CD)*100/TotalVol as Pct
FROM database.S_ORDER O
LEFT JOIN EDWPRDR_VW40_SBLCPY.S_ORDER_TYPE T on T.ROW_ID = ORDER_TYPE_ID
cross join (select count(*) as TotalVol from database.S_ORDER) Tot
GROUP BY T."NAME", STATUS_CD, TotalVol
WHERE CAST(CREATED AS DATE) = '2018-09-21' AND STATUS_CD <> 'Pending' AND STATUS_CD <> 'Pending-Online'
ORDER BY T."NAME", STATUS_CD DESC
A where clause comes before a group by clause, so the query
shown in the question isn't valid.
Always prefix every column reference with the relevant table alias, below I have assumed that where you did not use the alias that it belongs to the orders table.
You probably do not need a subquery for this left join. While there are times when a subquery is needed or good for performance, this does not appear to be the case here.
Most modern SQL compliant databases provide "window functions", and Teradata does do this. They are extremely useful, and here when you combine count() with an over clause you can get the total of all rows without needing another subquery or join.
Because there is neither sample data nor expected result provided with the question I do not actually know which numbers you really need for your percentage calculation. Instead I have opted to show you different ways to count so that you can choose the right ones. I suspect you are getting 100 for each row because the count(status_cd) is equal to the count(row_id). You need to count status_cd differently to how you count row_id. nb: The count() function increases by 1 for every non-null value
I changed the way your date filter is applied. It is not efficient to change data on every row to suit constants in a where clause. Leave the data untouched and alter the way you apply the filter to suit the data, this is almost always more efficient (search sargable)
SELECT
t.OTYPE
, o.STATUS_CD
, COUNT(o.STATUS_CD) count_status
, COUNT(t.ROW_ID count_row_id
, count(t.row_id) over() count_row_id_over
FROM dbo.S_ORDER o
LEFT JOIN dbo.S_ORDER_TYPE t ON t.TYPEJOIN = o.ORDER_TYPE_ID
/*Excludes pending and pending online orders */
WHERE o.CREATED >= '2018-09-21' AND o.CREATED < '2018-09-22'
AND o.STATUS_CD <> 'Pending'
AND o.STATUS_CD <> 'Pending-Online'
GROUP BY
t.OTYPE
, o.STATUS_CD
ORDER BY
t.OTYPE
, o.STATUS_CD DESC
As #TomC already noted, there's no need for the join to a Derived Table. The simplest way to get the percentage is based on a Group Sum. I also changed the date to an Standard SQL Date Literal and moved the where before group by.
SELECT
t."NAME",
o.STATUS_CD,
Count(o.STATUS_CD) AS STATVOL,
-- rule of thumb: multiply first then divide, otherwise you will get unexpected results
-- (Teradata rounds after each calculation)
100.00 * STATVOL / Sum(STATVOL) Over ()
FROM database.S_ORDER AS O
/* Finding definitions for status codes & attaching */
LEFT JOIN database.S_ORDER_TYPE AS t
ON t.ROW_ID = o.ORDER_TYPE_ID
/*Excludes pending and pending online orders */
-- if o.CREATED is a Timestamp there's no need to apply the CAST
WHERE Cast(o.CREATED AS DATE) = DATE '2018-09-21'
AND o.STATUS_CD NOT IN ('Pending', 'Pending-Online')
GROUP BY (T.OTYPE, o.STATUS_CD)
ORDER BY T.OTYPE, o.STATUS_CD DESC
Btw, you probably don't need an Outer Join, Inner should return the same result.

SQL - 2 table values to be grouped by third unconnected value

I want to create a graph that pulls data from 2 user questions generated from within an SQL database.
The issue is that the user questions are stored in the same table, as are the answers. The only connection is that the question string includes a year value, which I extract using the LEFT command so that I output a column called 'YEAR' with a list of integer values running from 2013 to 2038 (25 year period).
I then want to pull the corresponding answers ('forecast' and 'actual') from each 'YEAR' so that I can plot a graph with a couple of values from each year (sorry if this isn't making any sense). The graph should show a forecast line covering the 25 year period with a second line (or column) showing the actual value as it gets populated over the years. I'll then be able to visualise if our actual value is close to our original forecast figures (long term goal!)
CODE BELOW
SELECT CAST((LEFT(F_TASK_ANS.TA_ANS_QUESTION,4)) AS INTEGER) AS YEAR,
-- first select takes left 4 characters of question and outputs value as string then coverts value to whole number.
CAST((CASE WHEN F_TASK_ANS.TA_ANS_QUESTION LIKE '%forecast' THEN F_TASK_ANS.TA_ANS_ANSWER END) AS NUMERIC(9,2)) AS 'FORECAST',
CAST((CASE WHEN F_TASK_ANS.TA_ANS_QUESTION LIKE '%actual' THEN ISNULL(F_TASK_ANS.TA_ANS_ANSWER,0) END) AS NUMERIC(9,2)) AS 'ACTUAL'
-- actual value will be null until filled in each year therefore ISNULL added to replace null with 0.00.
FROM F_TASK_ANS INNER JOIN F_TASKS ON F_TASK_ANS.TA_ANS_FKEY_TA_SEQ = F_TASKS.TA_SEQ
WHERE TA_ANS_ANSWER <> ''
AND (TA_TASK_ID LIKE '%6051' OR TA_TASK_ID LIKE '%6052')
-- The two numbers above refer to separate PPM questions that the user enters a value into
I tried GROUP BY 'YEAR' but I get an
Error: Each GROUP BY expression must contain at least one column that
is not an outer reference - which I assume is because I haven't linked
the 2 tables in any way...
Should I be adding a UNION so the tables are joined?
What I want to see is something like the following output (which I'll graph up later)
YEAR FORECAST ACTUAL
2013 135000 127331
2014 143000 145102
2015 149000 0
2016 158000 0
2017 161000 0
2018... etc
Any help or guidance would be hugely appreciated.
Thanks
Although the syntax is pretty hairy, this seems like a fairly simple query. You are in fact linking your two tables (with the JOIN statement) and you don't need a UNION.
Try something like this (using a common table expression, or CTE, to make the grouping clearer, and changing the syntax for slightly greater clarity):
WITH data
AS (
SELECT YEAR = CAST((LEFT(A.TA_ANS_QUESTION,4)) AS INTEGER)
, FORECAST = CASE WHEN A.TA_ANS_QUESTION LIKE '%forecast'
THEN CONVERT(NUMERIC(9,2), A.TA_ANS_ANSWER)
ELSE CONVERT(NUMERIC(9,2), 0)
END
, ACTUAL = CASE WHEN A.TA_ANS_QUESTION LIKE '%actual'
THEN CONVERT(NUMERIC(9,2), ISNULL(A.TA_ANS_ANSWER,0) )
ELSE CONVERT(NUMERIC(9,2), 0)
END
FROM F_TASK_ANS A
INNER JOIN F_TASKS T
ON A.TA_ANS_FKEY_TA_SEQ = T.TA_SEQ
-- It sounded like you wanted to include the ones where the answer was null. If
-- that's wrong, get rid of the test for NULL.
WHERE (A.TA_ANS_ANSWER <> '' OR A.TA_ANS_ANSWER IS NULL)
AND (TA_TASK_ID LIKE '%6051' OR TA_TASK_ID LIKE '%6052')
)
SELECT YEAR
, FORECAST = SUM(data.Forecast)
, ACTUAL = SUM(data.Actual)
FROM data
GROUP BY YEAR
ORDER BY YEAR
Try something like this ...
SELECT CAST((LEFT(F_TASK_ANS.TA_ANS_QUESTION,4)) AS INT) AS [YEAR]
,SUM(CAST((CASE WHEN F_TASK_ANS.TA_ANS_QUESTION LIKE '%forecast'
THEN F_TASK_ANS.TA_ANS_ANSWER ELSE 0 END) AS NUMERIC(9,2))) AS [FORECAST]
,SUM(CAST((CASE WHEN F_TASK_ANS.TA_ANS_QUESTION LIKE '%actual'
THEN F_TASK_ANS.TA_ANS_ANSWER ELSE 0 END) AS NUMERIC(9,2))) AS [ACTUAL]
FROM F_TASK_ANS INNER JOIN F_TASKS
ON F_TASK_ANS.TA_ANS_FKEY_TA_SEQ = F_TASKS.TA_SEQ
WHERE TA_ANS_ANSWER <> ''
AND (TA_TASK_ID LIKE '%6051' OR TA_TASK_ID LIKE '%6052')
GROUP BY CAST((LEFT(F_TASK_ANS.TA_ANS_QUESTION,4)) AS INT)

SQL SUM function doubling the amount it should using multiple tables

My query below is doubling the amount on the last record it returns. I have 3 tables - activities, bookings and tempbookings. The query needs to list the activities and attached information and pull the total number (using the SUM) of places booked (as BookingTotal) from the booking table by each activity and then it needs to calculate the same for tempbookings (as tempPlacesReserved) providing the reservedate field inside that table is in the future.
However the first issue is that if there are no records for an activity in the tempbookings table it does not return any records for that activity at all, to get around this i created dummy records in the past so that it still returns the record, but if I can make it so I don't have to do this I would prefer it!
The main issue I have is that on the final record of the returned results it doubles the booking total and the places reserved which of course makes the whole query useless.
I know that I am doing something wrong I just haven't been able to sort it, I have searched similar issues online but am unable to apply them to my situation correctly.
Any help would be appreciated.
P.S. I'm aware that normally you wouldn't need to fully label all the paths to the databases, tables and fields as I have but for the program I am planning to use it in I have to do it this way.
Code:
SELECT [LeisureActivities].[dbo].[activities].[activityID],
[LeisureActivities].[dbo].[activities].[activityName],
[LeisureActivities].[dbo].[activities].[activityDate],
[LeisureActivities].[dbo].[activities].[activityPlaces],
[LeisureActivities].[dbo].[activities].[activityPrice],
SUM([LeisureActivities].[dbo].[bookings].[bookingPlaces]) AS 'bookingTotal',
SUM (CASE WHEN[LeisureActivities].[dbo].[tempbookings].[tempReserveDate] > GetDate() THEN [LeisureActivities].[dbo].[tempbookings].[tempPlaces] ELSE 0 end) AS 'tempPlacesReserved'
FROM [LeisureActivities].[dbo].[activities],
[LeisureActivities].[dbo].[bookings],
[LeisureActivities].[dbo].[tempbookings]
WHERE ([LeisureActivities].[dbo].[activities].[activityID]=[LeisureActivities].[dbo].[bookings].[activityID]
AND [LeisureActivities].[dbo].[activities].[activityID]=[LeisureActivities].[dbo].[tempbookings].[tempActivityID])
AND [LeisureActivities].[dbo].[activities].[activityDate] > GetDate ()
GROUP BY [LeisureActivities].[dbo].[activities].[activityID],
[LeisureActivities].[dbo].[activities].[activityName],
[LeisureActivities].[dbo].[activities].[activityDate],
[LeisureActivities].[dbo].[activities].[activityPlaces],
[LeisureActivities].[dbo].[activities].[activityPrice];
Your current query is using an INNER JOIN between each of the tables so if the tempBookings table has no records, you will not return anything.
I would advise that you start to use JOIN syntax. You might also need to use subqueries to get the totals.
SELECT a.[activityID],
a.[activityName],
a.[activityDate],
a.[activityPlaces],
a.[activityPrice],
coalesce(b.bookingTotal, 0) bookingTotal,
coalesce(t.tempPlacesReserved, 0) tempPlacesReserved
FROM [LeisureActivities].[dbo].[activities] a
LEFT JOIN
(
select activityID,
SUM([bookingPlaces]) AS bookingTotal
from [LeisureActivities].[dbo].[bookings]
group by activityID
) b
ON a.[activityID]=b.[activityID]
LEFT JOIN
(
select tempActivityID,
SUM(CASE WHEN [tempReserveDate] > GetDate() THEN [tempPlaces] ELSE 0 end) AS tempPlacesReserved
from [LeisureActivities].[dbo].[tempbookings]
group by tempActivityID
) t
ON a.[activityID]=t.[tempActivityID]
WHERE a.[activityDate] > GetDate();
Note: I am using aliases because it is easier to read
Use new SQL-92 Join syntax, and make join to tempBookings an outer join. Also clean up your sql with table aliases. Makes it easier to read. As to why last row has doubled values, I don't know, but on off chance that it is caused by extra dummy records you entered. get rid of them. That problem is fixed by using outer join to tempBookings. The other possibility is that the join conditions you had to the tempBookings table(t.tempActivityID = a.activityID) is insufficient to guarantee that it will match to only one record in activities table... If, for example, it matches to two records in activities, then the rows from Tempbookings would be repeated twice in the output, (causing the sum to be doubled)
SELECT a.activityID, a.activityName, a.activityDate,
a.activityPlaces, a.activityPrice,
SUM(b.bookingPlaces) bookingTotal,
SUM (CASE WHEN t.tempReserveDate > GetDate()
THEN t.tempPlaces ELSE 0 end) tempPlacesReserved
FROM LeisureActivities.dbo.activities a
Join LeisureActivities.dbo.bookings b
On b.activityID = a.activityID
Left Join LeisureActivities.dbo.tempbookings t
On t.tempActivityID = a.activityID
WHERE a.activityDate > GetDate ()
GROUP BY a.activityID, a.activityName,
a.activityDate, a.activityPlaces,
a.activityPrice;