Subtracting results from two queries / sub queries - sql

I'm trying to create a query which returns the results of customer orders (created in a month e.g. January) - the cancelled customer orders in that exact month (cancelled customer orders in month of January) and display the results grouped by location (Rows) and by year with month (Columns).
Currently I have a table containing all the customer order information both created and cancelled. Each customer order has a unique order number, location where it was sold, creation date and cancellation date. If the customer order is still valid, then the cancellation date will be null or "//". If the customer order is cancelled then it will have a cancellation date. As some additional information a customer order can be created in January 2019 and cancelled in July or August, or December etc. What I would like to obtain is the net customer orders for all the months by doing gross customer orders for a month - cancelled customer orders for that month and for a specific location = net customer orders for that month for that location.
In order to achieve this what I have tried, was to create two separate queries from the table, first one containing all the valid customer orders and the second one containing all the cancellations. Then i tried creating a cross-tab between the two other queries, trying to count what I mentioned above, grouping by location and then pivoting the of the year and month.
First query with valid customer orders named cust_valid (simplified):
SELECT cust_ords.[SaleLoc], cust_ords.[OrderNum], cust_ords.[CreationDate], cust_ords.[CancelDate]
FROM cust_ords
WHERE cust_ords.[CancelDate] = "" OR cust_ords.[CancelDate] = "//";
Second query with cancelled customer orders named cust_cancelled (simplified):
SELECT cust_ords.[SaleLoc], cust_ords.[OrderNum], cust_ords.[CreationDate], cust_ords.[CancelDate]
FROM cust_ords
WHERE cust_ords.[CancelDate] <> "" OR cust_ords.[CancelDate] <> "//";
Last, a cross-tab between them:
TRANSFORM Count(cust_valid.[OrderNum]) AS [NetOrderCount]
SELECT cust_valid.[SaleLoc]
FROM cust_valid LEFT JOIN cust_cancelled ON cust_valid.[CreationDate] = cust_cancelled.[CancelDate]
WHERE cust_valid.[CreationDate] = cust_cancelled.[CancelDate]
GROUP BY cust_ords.[SaleLoc]
PIVOT cust_valid.[CreationDate];
In this sense, I am trying to obtain (count) the net customer orders (total created for a month - what was cancelled in that month) for every given location and display the results per month (basically the columns names should be the year and the month). So for example if i have 10 customer orders in January, 5 in February and 15 in March, if 3 of the ones in January get cancelled in March, then I would like to count for the month of March 15 - 3, thus ending up with January 10, February 5, March 12.

First of all, you say an order is valid is valid if cancellation date is null or //, however you test for:
WHERE cust_ords.[CancelDate] = "" OR cust_ords.[CancelDate] = "//";
To test for null use [CancelDate] is null, or shorthand the test to ISNULL([CancelDate],'//')='//'
Second, in your second query you test for cancelled orders, with
WHERE cust_ords.[CancelDate] <> "" OR cust_ords.[CancelDate] <> "//";
That is not the negation of your test for cancelled orders!
!(A or B) => !A and !B
So you should use
WHERE cust_ords.[CancelDate] <> "" and cust_ords.[CancelDate] <> "//";
Or rather ISNULL(cust_ords.[CancelDate],'//')!='//'
Noow to your query itself, you are joining on dates, that is you join orders on a given date with cancellations on the same date. However you want to see orders and cancellation pr month. Since you left join cancellations you will only ever count cancellations that happen on the same date as orders!
SELECT
cust_valid.[SaleLoc]
, Format(
iif(isnull(cust_ords.[CancelDate],'//')='//'
,cust_ords.[CreationDate]
,cust_ords.[CancelDate]
,'MMMM yy') Mnth
, sum(iif(isnull(cust_ords.[CancelDate],'//'='//',1,0)) ValidOrders
, sum(iif(isnull(cust_ords.[CancelDate],'//'='//',0,1)) CancelledOrders
, sum(iif(isnull(cust_ords.[CancelDate],'//'='//',1,0))
- sum(iif(isnull(cust_ords.[CancelDate],'//'='//',0,1)) NetOrderCount
FROM
cust_ords
group by
cust_valid.[SaleLoc]
, Format(
iif(isnull(cust_ords.[CancelDate],'//')='//'
,cust_ords.[CreationDate]
,cust_ords.[CancelDate]
,'MMMM yy')
This should give you the basic data useful for pivoting, at least in SQL Server

THank you #Søren Kongstad for your useful explanations. I have modified / corrected the code accordingly to provide me with the needed results:
SELECT CustOrders.[Grupp namn], Format(IIf(CustOrders.[DatAnnulCde]="/ /",CustOrders.[DatCreatCde],CustOrders.[DatAnnulCde]),"yyyy-mm") AS year_month,
Sum(IIf(IsNull(CustOrders.DatCreatCde),0,1)) AS GrossOrders, Sum(IIf(CustOrders.DatAnnulCde<>"/ /",1,0)) AS CancelledOrders,
Sum(IIf(IsNull(CustOrders.DatCreatCde),0,1)) - Sum(IIf(CustOrders.DatAnnulCde<>"/ /",1,0)) AS NetOrders
FROM CustOrders
WHERE CustOrders.[CarType] = "Renault PC" And CustOrders.[DatCreatCde] >= Format(Year(Now())&"-01-01","yyyy-mm-dd")
GROUP BY CustOrders.[Grupp namn], Format(IIf(CustOrders.[DatAnnulCde]="/ /",CustOrders.[DatCreatCde],CustOrders.[DatAnnulCde]),"yyyy-mm");
The fields have different names then in the initial examples
DatAnnulCde = CancelDate
DatCreatCde = CreationDate

Related

How to find if there is a match in a certain time interval between two different date fields SQL?

I have a column in my fact table that defines whether a Supplier is old or new based on the following case-statement:
CASE
WHEN (SUBSTRING([Reg date], 1, 6) = SUBSTRING([Invoice date], 1, 6)
THEN ('New supplier')
ELSE('Old supplier')
END as [Old/New supplier]
So for example, if a Supplier was registered 201910 and Invoice date was 201910 then the Supplier would be considered a 'New supplier' that month. Now I want to calculate the number of Old/New suppliers for each month by doing an distinct count on Supplier no, which is not a problem. The last step is where it gets tricky, now I want to count the number of New/Old suppliers over a 12-month period(if there has been a match on Invoice date and reg date in any of the lagging 12 months). So I create the following mdx expression:
aggregate(parallelperiod([D Time].[Year-Month-Day].[Year],1,[D Time].[Year-Month-Day].currentmember).lead(1) : [D Time].[Year-Month-Day].currentmember ,[Measures].[Supplier No Distinct Count])
The issue I am facing is that it will count Supplier no "1234" twice since it has been both new and old during that time period. What I wish is that, if it finds one match it would be considered a "New" Supplier for that 12- month period.
This is how the result ends up looking but I want it to be zero for "Old" since Reg date and Invoice date matched once during that 12-month period it should be considered new for the whole Rolling 12 month on 201910
Any help, possible approaches or ideas are highly appreciated.
Best regards,
Rubrix
Aggregate first at the supplier level and then at the type level:
select type, count(*)
from (select supplierid,
(case when min(substring(regdate, 1, 6)) = min(substring(invoicedate, 1, 6))
then 'new' else 'old'
end) as type
from t
group by supplierid
) s
group by type;
Note: I assume your date columns are in some obscure string format for your code to work. Otherwise, you should be using appropriate date functions.
SELECT COUNT(*) OVER () AS TotalCount
FROM Facts
WHERE Regdate BETWEEN(olddate, newdate) OR InvoiceDate BETWEEN(olddate, newdate)
GROUP BY
Supplier
The above query will return all the suppliers within that time period and then group them. Thus COUNT(*) will only include unique subscribers.
You might wanna change the WHERE clause because I didn't quite understand how you are getting the 12 month period. Generally if your where clause returns the suppliers within that time period(they don't have to be unique) then the group by and count will handle the rest.

How to add custom YoY field to output?

I'm attempting to determine the YoY growth by month, 2017 to 2018, for number of Company bookings per property.
I've tried casting and windowed functions but am not obtaining the correct result.
Example Table 1: Bookings
BookID Amnt BookType InDate OutDate PropertyID Name Status
-----------------------------------------------------------------
789555 $1000 Company 1/1/2018 3/1/2018 22111 Wendy Active
478141 $1250 Owner 1/1/2017 2/1/2017 35825 John Cancelled
There are only two book types (e.g., Company, Owner) and two Book Status (e.g., Active and Cancelled).
Example Table 2: Properties
Property ID State Property Start Date Property End Date
---------------------------------------------------------------------
33111 New York 2/3/2017
35825 Michigan 7/21/2016
The Property End Date is blank when the company still owns it.
Example Table 3: Months
Start of Month End of Month
-------------------------------------------
1/1/2018 1/31/2018
The previous developer created this table which includes a row for each month from 2015-2020.
I've tried many various iterations of my current code and can't even come close.
Desired Outcome
I need to find the YoY growth by month, 2017 to 2018, for number of Company bookings per property. The stakeholder has requested the output to have the below columns:
Month Name Bookings_Per_Property_2017 Bookings_Per_Property_2018 YoY
-----------------------------------------------------------------------
The number of Company bookings per property in a month should be calculated by counting the total number of active Company bookings made in a month divided by the total number of properties active in the month.
Here is a solution that should be close to what you need. It works by:
LEFT JOINing the three tables; the important part is to properly check the overlaps in date ranges between months(StartOfMonth, EndOfMonth), bookings(InDate, OutDate) and properties(PropertyStartDate, PropertyEndDate): you can have a look at this reference post for general discussion on how to proceed efficiently
aggregating by month, and using conditional COUNT(DISTINCT ...) to count the number of properties and bookings in each month and year. The logic implicitly relies on the fact that this aggregate function ignores NULL values. Since we are using LEFT JOINs, we also need to handle the possibility that a denominator could have a 0 value.
Notes:
you did not provide expected results so this cannot be tested
also, you did not explain how to compute the YoY column, so I left it alone; I assume that you can easily compute it from the other columns
Query:
SELECT
MONTH(m.StartOfMonth) AS [Month],
COUNT(DISTINCT CASE WHEN YEAR(StartOfMonth) = 2017 THEN b.BookID END)
/ NULLIF(COUNT(DISTINCT CASE WHEN YEAR(StartOfMonth) = 2017 THEN p.PropertyID END), 0)
AS Bookings_Per_Property_2017,
COUNT(DISTINCT CASE WHEN YEAR(StartOfMonth) = 2018 THEN b.BookID END)
/ NULLIF(COUNT(DISTINCT CASE WHEN YEAR(StartOfMonth) = 2018 THEN p.PropertyID END), 0)
AS Bookings_Per_Property_2018
FROM months m
LEFT JOIN bookings b
ON m.StartOfMonth <= b.OutDate
AND m.EndOfMonth >= b.InDate
AND b.status = 'Active'
AND b.BookType = 'Company'
LEFT JOIN properties p
ON m.StartOfMonth <= COLAESCE(p.PropertyEndDate, m.StartOfMonth)
AND m.EndOfMonth >= p.PropertyStartDate
GROUP BY MONTH(m.StartOfMonth)

Showing Purchase Counts for ALL months - Including 0 purchase months

I want to join a complete calendar table, with per user purchasing data to show, for each user, any purchase counts for every month from 2014 to 2017. Some users may not have a purchase until 2016, but I would still want to have the results show 0's for each month up to the first purchase date, as well as any 0's in between months as well.
I can't get the 0 months to be included! I think it's because I'm doing this across many unique users, but it felt like the below code should work.
select
c.fscl_yr_num
,c.fscl_month_num
,t.user_id
,sum(nvl(t.trans_counter, 0))
from
appca.d_cal c
left join
transaction_data t
on c.cal_dt = t.trans_dt
and t.trans_type = 'Purchase'
and c.fscl_yr_num in (2014, 2015, 2016, 2017)
group by
c.fscl_yr_num
,c.fscl_month_num
,t.user_id
order by
t.user_id
,c.fscl_yr_num
,c.fscl_month_num
;
Try
SUM(NVL(t.trans_counter,0))

How to create a dynamic where clause in sql?

So I have created a table that has the following columns from a transaction table with all customer purchase records:
1. Month-Year, 2.Customer ID, 3. Number of Transactions in that month.
I'm trying to create a table that has the output of
1. Month-Year, 2. Number of active customers defined by having at least 1 purchase in the previous year.
The code that I have currently is this but the case when obviously only capturing one date and the where clause isn't dynamic. Would really appreciate your help.
select month_start_date, cust_ID,
(case when month_start_Date between date and add_months(date, -12) then count(cust_ID) else 0 end) as active
from myserver.mytable
where
month_start_Date>add_months(month_start_date,-12)
group by 1,2
EDIT: I'm just trying to put a flag next to a customer if they are active in each month defined as having at least one transaction in the last year thanks!
You might use Teradata's proprietary EXPAND ON synax for creating time series:
SELECT month_start_date, COUNT(*)
FROM
( -- create one row for every month within the next year
-- after a customer's transaction
SELECT DISTINCT
BEGIN(pd) AS month_start_date,
cust_ID
FROM myserver.mytable
EXPAND ON PERIOD(month_start_date, ADD_MONTHS(month_start_date,12)) AS pd
BY ANCHOR MONTH_BEGIN -- every 1st of month
FOR PERIOD (DATE - 500, DATE) -- use this to restrict to a specific date range
) AS dt
GROUP BY month_start_date
ORDER BY month_start_date

Optimizing NOT IN query in Access SQL

I am new to Access and am using Access 2007.
I am doing a simple query on a database that has a list of customers who visits a workshop.
I want to send out reminders to the customers for their servicing 3 months from the last time they visited. I have created a query to be able to return me the list of customers who has visited 3 months from the current month. For example, if it is May now, 3 months ago would be March (inclusive of May).
However, customers who visited 3 months ago may have visited again 2 months ago. For example, customer A came in March and April. His last visit was in April, and hence, should not appear in the result if I were to run the query in May, as his reminder should only be sent out in June.
My query has taken care of this, however, it is rather slow. It takes some time for it to load in Access. Any help would be appreciated in optimizing it.
The only important field here is Invoice.DebCode which is the customersID in the database. There is another table DEBTOR, which is the table of customers together with their particulars.
I used the INNER JOIN as I need to display the customer(Debtor) address and particulars in the result.
SELECT Invoice.InvNo, Invoice.InvDate, Invoice.DebCode, Debtor.DebName, Debtor.AddL1, Debtor.AddL2, Debtor.AddL3, Invoice.CarNo, Invoice.ChaNo, Invoice.ExcReason
FROM Debtor
JOIN Invoice ON Debtor.DebCode = Invoice.DebCode
WHERE Year(InvDate) = Year(Now())
AND Month(InvDate) = Month(Now()) - 2
AND Invoice.DebCode NOT IN (SELECT Invoice.DebCode
FROM Invoice
WHERE Year(InvDate) = Year(Now())
AND ( (Month(InvDate) = Month(Now()) -1)
OR (Month(InvDate) = Month(Now())) )
You can dramatically speed up your query by adjusting your WHERE clauses so that the comparisons get done directly against the date field (ie, without passing it through the Month() and Year() functions). Doing it this way allows the Jet engine to make use of the index you have on the Invoice.InvDate field (you do have that field indexed, right?).
SELECT I.InvNo, I.InvDate, I.DebCode, D.DebName, D.AddL1, D.AddL2, D.AddL3,
I.CarNo, I.ChaNo, I.ExcReason
FROM Debtor AS D
INNER JOIN Invoice AS I
ON D.DebCode = I.DebCode
WHERE I.InvDate Between DateSerial(Year(Now()), Month(Now()) - 2, 1)
And DateSerial(Year(Now()), Month(Now()) - 1, 0)
AND I.DebCode NOT IN
(SELECT Invoice.DebCode FROM Invoice
WHERE Invoice.InvDate > DateSerial(Year(Now()), Month(Now()) - 1, 0))
What about something like:
SELECT a.debcode, a.debname, a.debstuff, b.most_recent AS last_over_three_months
FROM debtor AS a INNER JOIN
(
SELECT debcode, Max(invdate) AS most_recent
FROM invoice
GROUP BY debcode
)
as b
ON a.debcode= b.debcode
WHERE (month(now()) - Month(most_recent) >2);
You will have to tweak for your stuff, but the idea is a subquery that select the most recent customer visit and then selects from that only the records that meet your month criteria.
I managed to speed up the query thanks to mwolfe02 suggestion.
For archiving and completion sake, I will explain my sql statements below.
SELECT I.InvNo, I.InvDate, I.DebCode, D.DebName, D.AddL1, D.AddL2, D.AddL3,
I.CarNo, I.ChaNo, I.ExcReason
FROM Debtor AS D
INNER JOIN Invoice AS I
ON D.DebCode = I.DebCode
WHERE I.InvDate Between DateSerial(Year(Now()), Month(Now()) - 2, 1)
And DateSerial(Year(Now()), Month(Now())- 1, 0)
AND I.DebCode NOT IN
(SELECT Invoice.DebCode FROM Invoice
WHERE Invoice.InvDate Between DateSerial(Year(Now()), Month(Now()) - 1, 1)
And DateSerial(Year(Now()), Month(Now()), 0))
I edited the bottom sub query as mwolfe checked only for customers in the current month. The customers eligible for a reminder only if they came 3 months ago. That is to say, they cannot have visited between the current month and the month before.
For example, customer A visited in April and May. The current month is June, thus he is not eligible for the reminder, as his last visit was in May.
Customer B visited in April and June, thus he is not eligible for the reminder, as his last visit was in June.
Hence, the english version of the query would be:
Get customers who came 3 months ago from the last day of the current month and did not come in the current month and the month before.
I hope this helps anyone who has the same problem.
As darkjh and mikey suggested, we can "select the most recent customer visit and then selects from that only the records that meet your month criteria."
Thanks all!