SQL query to add column that counts number of encounters for the past year from each encounter - sql

I am trying to identify High-Usage status for customers, so at time of order how many orders did the customer place in the last year. Each customer has unique ID and each order has unique ID, with a date/time stamp at time of order. This is not just adding a count column, but a conditional count. I can recreate this in Excel using sumproduct, but wanted to see if I can automate the process in SMSS before my pull.
I tried a subquery column and then doing a join on a subquery result:
SELECT (*)
,HU_CUSTOMER_YOY
FROM data
LEFT JOIN (SELECT MAX(ORDER_ID) AS ORDER_ID
,COUNT(CUSTOMER_ID) AS HU_CUSTOMER_YOY
FROM data AS CUS_HU
WHERE ORDER_DTTM > DATEADD(YEAR, -1, ORDER_DTTM)
GROUP BY CUS_HU.CUSTOMER_ID)
CUSHU on CUSHU.ORDER_ID = data.ORDER_ID
This pulls in a value ONLY on the most recent order and counts ALL previous orders. To reiterate, I need a value on EACH unique order to count every order for that customer for the previous year from that unique order. My issue is using the DTTM column. If I use a static date like getdate(), it will count but I need the count for the DTTM-1year on EACH order to view historical data, i.e., when a customer began and fell-off High-Usage status, what contributed to the change, etc.
This is for a rather large dataset that is refreshed daily. I would prefer to not have the main query be aggregated, if possible, which is why I thought creating and joining a reference table would be preferred.
Is this possible?
Adding expected query results:
customer_id
order_id
HU_count
order_dttm
c1
c1-1
0
1/1/2020
c1
c1-2
1
7/1/2020
c1
c1-3
0
1/1/2022
c1
c1-4
1
1/10/2022
c2
c2-1
0
1/11/2022
c1
c1-5
2
1/14/2022
c2
c2-2
1
1/15/2022

I assume you are using SQL Server from your usage of the DATEADD function.
Based on my understanding of the requirements, this will show the count for each order for each customer in the previous year.
SELECT DISTINCT
customer_id as hu_customer_yoy
,COUNT(case when order_dttm > DATEADD(YEAR, -1, order_dttm) THEN 1 ELSE null END)
over (partition by customer_id, order_id) AS ORDER_COUNT
,ORDER_ID
FROM data

Related

Multiple sum subqueries for percentage

I need help with the following problem: I want to make a query that contains multiples sums and then takes those sums and uses them to get a percentage: percentage= s1/s1+s2.
I have as input the following data:
Orders shipping date, Nb of orders that have arrived late, Nb of orders that have arrived on time
What I want as output: The percentage of orders that have arrived late and orders that have arrived on time.
I want another column in the table that will have the percentage using SQL.
Concrete example:
*On 2022/01/04 **10:00 AM** I have 3 orders late and 4 order on time=> 7 orders in total. Percentage=3/7 (late), (4/7) on time
*At 2022/01/04 **11:00 AM** I have 5 orders late and 6 orders on time=>11 orders in total (but all this entry is summed with the previous entry so:) <=> 5+3 orders late, 4+6 orders on time, 18 orders in total => percentage= 8/18 late, 10 on time.
In order to sum previous entries order numbers with status "LATE" to current on time order number I wrote the following sql:
(sum1=s1)
SELECT s1.EventDate, (
SELECT SUM(s2.NbOfOrders)
FROM OrderShipmentStats s2
WHERE s2.EventDate <= s1.EventDate AND s2.Status='LATE'
) AS cnt
FROM OrderShipmentStats s1
GROUP BY s1.EventDate, s1.Status
The same kind of sql was written for "On Time" and it works. But what I need to do now is get the values and add them together of the two sql queries and based on the status which is late or on time do s1/s1+s2 or s2/s2+s1.
My problem is that I do not know how to do this formula in a single query using those 2 subqueries, any help would be great.
Picture with Table
Above there is the link with the picture containing how the table looks(I am new so I am not allowed to embed a photo).
The percentage column is the one I will add and there are lines pointing towards how that is calculated.
I created the table based on your image and added a few rows to it.
In the query you could see total orders count per hour, per status and the grand total as you mentioned in the image.
The query looks like:
create table OrderShipmentsStats
(
EventDate datetime not null,
Status varchar(10) not null,
OrdersCount int not null
)
insert into OrderShipmentsStats
values
('2022-01-04T10:00:00','Late',3),
('2022-01-04T10:00:00','On Time',4),
('2022-01-04T11:00:00','Late',5),
('2022-01-04T11:00:00','On Time',6),
('2022-01-04T12:00:00','Late',1),
('2022-01-04T12:00:00','On Time',2)
SELECT
EventDate,
Status,
OrdersCount,
TotalPerHour,
StatusTotal,
GrandStatusTotal,
-- at the line below, multiplying by 1.0 will convert the result and we would receive smth like 0.45, 0.123, some percentage
-- but we want the actual percent like 15%, or 50%. to obtain it, just multiply by 100
cast(1.0 * o.StatusTotal / o.GrandStatusTotal as decimal(5,3)) * 100 as Percentage
from
(
select
EventDate,
Status,
OrdersCount,
TotalPerHour,
StatusTotal,
SUM(TotalPerHour) over (partition by Status order by EventDate asc) as GrandStatusTotal
from
(
select
EventDate,
Status,
OrdersCount,
Sum(OrdersCount) over (partition by EventDate order by EventDate asc) as TotalPerHour,
SUM(OrdersCount) over (partition by Status order by EventDate asc) as StatusTotal
from OrderShipmentsStats
) as t
) as o
order by EventDate, Status

SQL - Count new entries based on last date

I have a table with the follow structure
ID ReportDate Object_id
What I need to know, is the count of new and count of old (Object id's)
For example: If I have the data below:
I want the following output grouped by ReportDate:
I thought a way doing it using a Where clause based on date, however i need the data for all the dates I have in the table. To see the count of what already existed in the previous report and what is new at that report. Any Ideas?
Edit: New/Old definition- New would be the records that never appeared before that report run date and appeared on this one, whereas old is the number of records that had at least one match in previous dates. I'll edit the post to include this info.
managed to do it using a left join. Below is my solution in case it helps anyone in the future :)
SELECT table.ReportRunDate,
-1*sum(table.ReportRunDate = new_table.init_date) as count_new,
-1*sum(table.ReportRunDate <> new_table.init_date) as count_old,
count(*) as count_total
FROM table LEFT JOIN
((SELECT Object_ID, min(ReportRunDate) as init_date
FROM table
GROUP By OBJECT_ID) as new_table)
ON table.Object_ID = new_table.Object_ID
GROUP BY ReportRunDate
This would work in Oracle, not sure about ms-access:
SELECT ReportDate
,COUNT(CASE WHEN rnk = 1 THEN 1 ELSE NULL END) count_of_new
,COUNT(CASE WHEN rnk <> 1 THEN 1 ELSE NULL END)count_of_old
FROM (SELECT ID
,ReportDate
,Object_id
,RANK() OVER (PARTITION BY Object_id ORDER BY ReportDate) rnk
FROM table_name)
GROUP BY ReportDate
Inner query should rank each occurence of object_id based on the ReportDate so the 1st occurrence of certain object_id will have rank = 1, the next one rank = 2 etc.
Then the outer query counts how many records with rank equal/not equal 1 are the within each group.
I assumed that 1 object_id can appear only once within each reportDate.

Conditional left join on max date and where clause in second table

I am attempting to join a customer table with sales table where I show the list of all customers in database and any paid sale the customer might have in the sales tables. Now a customer can have multiple sales rows in the sales table.
This is an example sales record of one customer with multiple sales in the sale tables
while extracting this record I would like to get only the MAX (q_saledatetime) WHERE the q_paidamount is > 0.
as in show me the last time this customer made a payment to us. So in this case row 2 where they paid 8.90 is what I would like to get for that customer. If a customer has no record in the sales table, show their name/details on the list either way.
My failure at the moment is how to include the where clause of the paid amount + max date column.
ATTEMPT A
select DISTINCT ON (q_customer.q_code)
q_customer.q_code, q_customer.q_name, -- customer info
MAX(q_saleheader.q_saledatetime) AS latestDate, q_saleheader.q_paidamount -- saleheader info
FROM q_customer
LEFT JOIN q_saleheader ON (q_customer.q_code = q_saleheader.q_customercode)
group by q_customer.q_code, q_customer.q_name , q_saleheader.q_saledatetime, q_saleheader.q_paidamount
order by q_customer.q_code ASC
which results in
so for Fred Blogg is picking up details from row 4 instead of 2 (first image). As there's no rule for q_paidamount at this point
ATTEMPT B
SELECT
customer.q_code, customer.q_name, -- customer info
sale.q_saledatetime, sale.q_paidamount -- sale info
FROM q_customer customer
LEFT JOIN (SELECT * FROM q_saleheader WHERE q_saledatetime =
(SELECT MAX(q_saledatetime) FROM q_saleheader b1 where q_paidamount > 0 ))
sale ON sale.q_customercode = customer.q_code
which results in
This doesnt seem to be getting any information from the sale table at all.
Update:
After having a closer look at my first attempt I amended the statement and came up with this solution which achieves the same results as Michal's answer. I just curious to know is there any pitfalls or perfomance disadvantages with the following way.
select DISTINCT ON (q_customer.q_code)
q_customer.q_code, q_customer.q_name, -- customer info
q_saleheader.q_saledatetime, q_saleheader.q_paidamount -- saleheader info
FROM q_customer
LEFT JOIN q_saleheader ON (q_customer.q_code = q_saleheader.q_customercode AND
q_saleheader.q_paidamount > 0 )
group by q_customer.q_code, q_customer.q_name , q_saleheader.q_saledatetime,
q_saleheader.q_paidamount
order by q_customer.q_code ASC, q_saleheader.q_saledatetime DESC
main change was adding AND q_saleheader.q_paidamount > 0 on the join and q_saleheader.q_saledatetime DESC to make sure are getting the top row of that related data. As mentioned both Michal's answer and this solution achieve the same results. Just curious about pitfalls in either of the two ways.
Try this query:
SELECT c.q_code,
c.q_name,
CASE WHEN q_saledatetime <> '1900-01-01 00:00:00.000' THEN q_saledatetime END q_saledatetime,
q_paidamount
FROM (
SELECT c.q_code,
c.q_name,
coalesce(s.q_saledatetime, '1900-01-01 00:00:00.000') q_saledatetime, --it will indicate customer with no data
s.q_paidamount,
ROW_NUMBER() OVER (PARTITION BY c.q_code ORDER BY COALESCE(s.q_saledatetime, '1900-01-01') DESC) rn
FROM q_customer c
LEFT JOIN (SELECT q_saledatetime,
q_paidamount
FROM q_saleheader
WHERE q_paidamount > 0) s
ON c.q_code = s.q_customercode
) c WHERE rn = 1

Using a stored procedure in Teradata to build a summarial history table

I am using Terdata SQL Assistant connected to an enterprise DW. I have written the query below to show an inventory of outstanding items as of a specific point in time. The table referenced loads and stores new records as changes are made to their state by load date (and does not delete historical records). The output of my query is 1 row for the specified date. Can I create a stored procedure or recursive query of some sort to build a history of these summary rows (with 1 new row per day)? I have not used such functions in the past; links to pertinent previously answered questions or suggestions on how I could get on the right track in researching other possible solutions are totally fine if applicable; just trying to bridge this gap in my knowledge.
SELECT
'2017-10-02' as Dt
,COUNT(DISTINCT A.RECORD_NBR) as Pending_Records
,SUM(A.PAY_AMT) AS Total_Pending_Payments
FROM DB.RECORD_HISTORY A
INNER JOIN
(SELECT MAX(LOAD_DT) AS LOAD_DT
,RECORD_NBR
FROM DB.RECORD_HISTORY
WHERE LOAD_DT <= '2017-10-02'
GROUP BY RECORD_NBR
) B
ON A.RECORD_NBR = B.RECORD_NBR
AND A.LOAD_DT = B.LOAD_DT
WHERE
A.RECORD_ORDER =1 AND Final_DT Is Null
GROUP BY Dt
ORDER BY 1 desc
Here is my interpretation of your query:
For the most recent load_dt (up until 2017-10-02) for record_order #1,
return
1) the number of different pending records
2) the total amount of pending payments
Is this correct? If you're looking for this info, but one row for each "Load_Dt", you just need to remove that INNER JOIN:
SELECT
load_Dt,
COUNT(DISTINCT record_nbr) AS Pending_Records,
SUM(pay_amt) AS Total_Pending_Payments
FROM DB.record_history
WHERE record_order = 1
AND final_Dt IS NULL
GROUP BY load_Dt
ORDER BY 1 DESC
If you want to get the summary info per record_order, just add record_order as a grouping column:
SELECT
load_Dt,
record_order,
COUNT(DISTINCT record_nbr) AS Pending_Records,
SUM(pay_amt) AS Total_Pending_Payments
FROM DB.record_history
WHERE final_Dt IS NULL
GROUP BY load_Dt, record_order
ORDER BY 1,2 DESC
If you want to get one row per day (if there are calendar days with no corresponding "load_dt" days), then you can SELECT from the sys_calendar.calendar view and LEFT JOIN the query above on the "load_dt" field:
SELECT cal.calendar_date, src.Pending_Records, src.Total_Pending_Payments
FROM sys_calendar.calendar cal
LEFT JOIN (
SELECT
load_Dt,
COUNT(DISTINCT record_nbr) AS Pending_Records,
SUM(pay_amt) AS Total_Pending_Payments
FROM DB.record_history
WHERE record_order = 1
AND final_Dt IS NULL
GROUP BY load_Dt
) src ON cal.calendar_date = src.load_Dt
WHERE cal.calendar_date BETWEEN <start_date> AND <end_date>
ORDER BY 1 DESC
I don't have access to a TD system, so you may get syntax errors. Let me know if that works or you're looking for something else.

SQL - Different sum levels in one select with where clause

I have 2 tables. One has the orginal amount that remains static. The second table has a list of partial amounts applied over time against the orginal amount in the first table.
DB Tables:
***memotable***
ID [primary, unique]
Amount (Orginal Amount)
***transtable***
ID [many IDs in transtable to single ID in memotable]
AmountUsed (amount applied)
ApplyDate (date applied)
I would like to find, in a single select, the ID, amount used since last week (ApplyDate > 2011-04-21), amount used to date.
The only rows that should appear in the result is when an amount has been used since last week (ApplyDate > 2011-04-21).
I'm stuck on trying to get the sum for the amount used to date, since that needs to include AmountUsed values that are outside of when ApplyDate > 2011-04-21.
It is possible to avoid subselects in this case:
SELECT
ID,
AmountUsedSinceLastWeek = SUM(CASE WHEN ApplyDate > '4/21/2011' THEN AmountUsed END)
AmountUsedToDate = SUM(AmountUsed)
FROM TransTable
GROUP BY ID
Since you want to limit it to rows that happened since last week, but also want to include the total to date, I think the most efficient method would be to use sub-selects...
SELECT
lastWeek.ID,
lastWeek.AmountUsedSinceLastWeek,
toDate.AmountUsedToDate
FROM
(
SELECT
ID,
SUM(AmountUsed) AS AmountUsedSinceLastWeek
FROM TransTable
WHERE ApplyDate > '4/21/2011'
GROUP BY ID
) lastWeek JOIN
(
SELECT
ID,
SUM(AmountUsed) AS AmountUsedToDate
FROM TransTable
GROUP BY ID
) toDate ON lastWeek.ID = toDate.ID