Calculating Datediff of two days based on when the sum of a column hits a number cap - sql

Tried to see if this was asked anywhere else but doesn't seem like it. Trying to create a sql query to give me the date difference in days between '2022-10-01' and the date when our impression sum hits our cap of 5.
For context, we may see duplicate dates because someone revisit our website that day so we'll get a different session number to pair with that count. Here's an example table of one individual and how many impressions logged.
My goal is to get the number of days it takes to hit an impression cap of 5. So for this individual, they would hit the cap on '2022-10-07' and the days between '2022-10-01' and '2022-10-07' is 6. I am also calculating the difference before/after '2023-01-01' since I need this count for Q4 of '22 and Q1 of '23 but will not include in the example table. I have other individuals to include but for the purpose of asking here, I kept it to one.
Current Query:
select
click_date,
case
when date(click_date) < date('2023-01-01') and sum(impression_cnt = 5) then datediff('day', '2022-10-01', click_date)
when date(click_date) >= date('2023-01-01') and sum(impression_cnt = 5) then datediff('day', '2023-01-01', click_date)
else 0
end days_to_capped
from table
group by customer, click_date, impression_cnt
customer
click date
impression_cnt
123456
2022-10-05
2
123456
2022-10-05
1
123456
2022-10-06
1
123456
2022-10-07
1
123456
2022-10-11
1
123456
2022-10-11
3
Result Table
customer
days_to_cap
123456
6
I'm currently only getting 0 days and then 81 days once it hits 2022-12-21 (last date) for this individual so i know I need to fix my query. Any help would be appreciated!
Edited: This is in snowflake!

So, the issue with your query is that the sum is being calculated at the level that you are grouping by, which is every field, so it will always just be the value of the impressions field every time.
What you need to do is a running sum, which is a SUM() OVER (PARTITION BY...) statement. And then qualify the results of that:
First, just to get the data that you have:
with x as (
select *
from values
(123456,'2022-10-05'::date,2),
(123456,'2022-10-05'::date,1),
(123456,'2022-10-06'::date,1),
(123456,'2022-10-07'::date,1),
(123456,'2022-10-11'::date,1),
(123456,'2022-10-11'::date,3) x (customer,click_date,impression_cnt)
)
Then, I query the CTE to do the running sum with a QUALIFY statement to choose the record that actually has the value I'm looking for
select
customer,
case
when click_date < '2023-01-01'::date and sum(impression_cnt) OVER (partition by customer order by click_date) = 5 then datediff('day', '2022-10-01', click_date)
when click_date >= '2023-01-01'::date and sum(impression_cnt) OVER (partition by customer order by click_date) = 5 then datediff('day', '2023-01-01', click_date)
else 0
end days_to_capped
from x
qualify days_to_capped > 0;
The qualify filters your results to just the record that you cared about.

Related

Group By with Case statement?

I need find the number Sum of orders over a 3 day range. so imagine a table like this
Order Date
300 1/5/2015
200 1/6/2015
150 1/7/2015
250 1/5/2015
400 1/4/2015
350 1/3/2015
50 1/2/2015
100 1/8/2015
So I want to create a Group by Clause that Groups anything with a date that has the same Month, Year and a Day from 1-3 or 4-6, 7-9 and so on until I reach 30 days.
It seems like what I would want to do is create a case for the grouping that includes a loop of some type but I am not sure if this is the best way or if it is even possible to combine them.
An alternative might be create a case statement that creates a new column that assigns group number and then grouping by that number, month, and Year.
Unfortunately I've never used a case statement so I am not sure which method is best or how to execute them especially with a loop.
EDIT: I am using Access so it looks like I will be using IIF instead of Case
Consider the Partition Function and a crosstab, so, for example:
TRANSFORM Sum(Calendar.Order) AS SumOfOrder
SELECT Month([CalDate]) AS TheMonth, Partition(Day([Caldate]),1,31,3) AS DayGroup
FROM Calendar
GROUP BY Month([CalDate]), Partition(Day([Caldate]),1,31,3)
PIVOT Year([CalDate]);
As an aside, I hope you have not named a field / column as Date.
How about the following:
COUNT OF ORDERS
select year([Date]) as yr,
month([Date]) as monthofyr,
sum(iif((day([Date])>=1) and (day([Date])<=3),1,0)) as days1to3,
sum(iif((day([Date])>=4) and (day([Date])<=6),1,0)) as days4to6,
sum(iif((day([Date])>=7) and (day([Date])<=9),1,0)) as days7to9,
sum(iif((day([Date])>=10) and (day([Date])<=12),1,0)) as days10to12,
sum(iif((day([Date])>=13) and (day([Date])<=15),1,0)) as days13to15,
sum(iif((day([Date])>=16) and (day([Date])<=18),1,0)) as days16to18,
sum(iif((day([Date])>=19) and (day([Date])<=21),1,0)) as days19to21,
sum(iif((day([Date])>=22) and (day([Date])<=24),1,0)) as days22to24,
sum(iif((day([Date])>=25) and (day([Date])<=27),1,0)) as days25to27,
sum(iif((day([Date])>=28) and (day([Date])<=31),1,0)) as days28to31
from tbl
where [Date] between x and y
group by year([Date]),
month([Date])
Replace x and y with your date range.
The last group is days 28 to 31 of the month, so it may contain 4 days' worth of orders, for months that have 31 days.
THE ABOVE IS A COUNT OF ORDERS.
If you want the SUM of the order amounts:
SUM OF ORDER AMOUNTS
select year([Date]) as yr,
month([Date]) as monthofyr,
sum(iif((day([Date])>=1) and (day([Date])<=3),order,0)) as days1to3,
sum(iif((day([Date])>=4) and (day([Date])<=6),order,0)) as days4to6,
sum(iif((day([Date])>=7) and (day([Date])<=9),order,0)) as days7to9,
sum(iif((day([Date])>=10) and (day([Date])<=12),order,0)) as days10to12,
sum(iif((day([Date])>=13) and (day([Date])<=15),order,0)) as days13to15,
sum(iif((day([Date])>=16) and (day([Date])<=18),order,0)) as days16to18,
sum(iif((day([Date])>=19) and (day([Date])<=21),order,0)) as days19to21,
sum(iif((day([Date])>=22) and (day([Date])<=24),order,0)) as days22to24,
sum(iif((day([Date])>=25) and (day([Date])<=27),order,0)) as days25to27,
sum(iif((day([Date])>=28) and (day([Date])<=31),order,0)) as days28to31
from tbl
where [Date] between x and y
group by year([Date]),
month([Date])

oracle sql: efficient way to calculate business days in a month

I have a pretty huge table with columns dates, account, amount, etc. eg.
date account amount
4/1/2014 XXXXX1 80
4/1/2014 XXXXX1 20
4/2/2014 XXXXX1 840
4/3/2014 XXXXX1 120
4/1/2014 XXXXX2 130
4/3/2014 XXXXX2 300
...........
(I have 40 months' worth of daily data and multiple accounts.)
The final output I want is the average amount of each account each month. Since there may or may not be record for any account on a single day, and I have a seperate table of holidays from 2011~2014, I am summing up the amount of each account within a month and dividing it by the number of business days of that month. Notice that there is very likely to be record(s) on weekends/holidays, so I need to exclude them from calculation. Also, I want to have a record for each of the date available in the original table. eg.
date account amount
4/1/2014 XXXXX1 48 ((80+20+840+120)/22)
4/2/2014 XXXXX1 48
4/3/2014 XXXXX1 48
4/1/2014 XXXXX2 19 ((130+300)/22)
4/3/2014 XXXXX2 19
...........
(Suppose the above is the only data I have for Apr-2014.)
I am able to do this in a hacky and slow way, but as I need to join this process with other subqueries, I really need to optimize this query. My current code looks like:
<!-- language: lang-sql -->
select
date,
account,
sum(amount/days_mon) over (partition by last_day(date))
from(
select
date,
-- there are more calculation to get the account numbers,
-- so this subquery is necessary
account,
amount,
-- this is a list of month-end dates that the number of
-- business days in that month is 19. similar below.
case when last_day(date) in ('','',...,'') then 19
when last_day(date) in ('','',...,'') then 20
when last_day(date) in ('','',...,'') then 21
when last_day(date) in ('','',...,'') then 22
when last_day(date) in ('','',...,'') then 23
end as days_mon
from mytable tb
inner join lookup_businessday_list busi
on tb.date = busi.date)
So how can I perform the above purpose efficiently? Thank you!
This approach uses sub-query factoring - what other RDBMS flavours call common table expressions. The attraction here is that we can pass the output from one CTE as input to another. Find out more.
The first CTE generates a list of dates in a given month (you can extend this over any range you like).
The second CTE uses an anti-join on the first to filter out dates which are holidays and also dates which aren't weekdays. Note that Day Number varies depending according to the NLS_TERRITORY setting; in my realm the weekend is days 6 and 7 but SQL Fiddle is American so there it is 1 and 7.
with dates as ( select date '2014-04-01' + ( level - 1) as d
from dual
connect by level <= 30 )
, bdays as ( select d
, count(d) over () tot_d
from dates
left join holidays
on dates.d = holidays.hol_date
where holidays.hol_date is null
and to_number(to_char(dates.d, 'D')) between 2 and 6
)
select yt.account
, yt.txn_date
, sum(yt.amount) over (partition by yt.account, trunc(yt.txn_date,'MM'))
/tot_d as avg_amt
from your_table yt
join bdays
on bdays.d = yt.txn_date
order by yt.account
, yt.txn_date
/
I haven't rounded the average amount.
You have 40 month of data, this data should be very stable.
I will assume that you have a cold body (big and stable easily definable range of data) and hot tail (small and active part).
Next, I would like to define a minimal period. It is a data range that is a smallest interval interesting for Business.
It might be year, month, day, hour, etc. Do you expect to get questions like "what was averege for that account between 1900 and 12am yesterday?".
I will assume that the answer is DAY.
Then,
I will calculate sum(amount) and count() for every account for every DAY of cold body.
I will not create a dummy records, if particular account had no activity on some day.
and I will save day, account, total amount, count in a TABLE.
if there are modifications later to the cold body, you delete and reload affected day from that table.
For hot tail there might be multiple strategies:
Do the same as above (same process, clear to support)
always calculate on a fly
use materialized view as an averege between 1 and 2.
Cold body table totalc could also be implemented as materialized view, but if data never change - no need to rebuild it.
With this you go from (number of account) x (number of transactions per day) x (number of days) to (number of account)x(number of active days) number of records.
That should speed up all following calculations.

SQL sum 2 different column by different condtion then subtraction and add

what I am trying is kind of complex, I will try my best to explain.
I achieved the first part which is to sum the column by hours.
example
ID TIMESTAMP CUSTAFFECTED
1 10-01-2013 01:00:23 23
2 10-01-2013 03:00:23 55
3 10-01-2013 05:00:23 2369
4 10-01-2013 04:00:23 12
5 10-01-2013 01:00:23 1
6 10-01-2013 12:00:23 99
7 10-01-2013 01:00:23 22
8 10-01-2013 02:00:23 3
output would be
Hour TotalCALLS CUSTAFFECTED
10/1/2013 01:00 3 46
10/1/2013 02:00 1 3
10/1/2013 03:00 1 55
10/1/2013 04:00 1 12
10/1/2013 05:00 1 2369
10/1/2013 12:00 1 99
Query
SELECT TRUNC(STARTDATETIME, 'HH24') AS hour,
COUNT(*) AS TotalCalls,
sum(CUSTAFFECTED) AS CUSTAFFECTED
FROM some_table
where STARTDATETIME >= To_Date('09-12-2013 00:00:00','MM-DD-YYYY HH24:MI:SS') and
STARTDATETIME <= To_Date('09-13-2013 00:00:00','MM-DD-YYYY HH24:MI:SS') and
GROUP BY TRUNC(STARTDATETIME, 'HH')
what I need
what I need sum 2 queries and group by timestamp/hour. 2nd query is exactly same as first but just the where clause is different.
2nd query
SELECT TRUNC(RESTOREDDATETIME , 'HH24') AS hour,
COUNT(*) AS TotalCalls,
SUM(CUSTAFFECTED) AS CUSTRESTORED
FROM some_table
where RESTOREDDATETIME >= To_Date('09-12-2013 00:00:00','MM-DD-YYYY HH24:MI:SS') and
RESTOREDDATETIME <= To_Date('09-13-2013 00:00:00','MM-DD-YYYY HH24:MI:SS')
GROUP BY TRUNC(RESTOREDDATETIME , 'HH24')
so I need to subtract custaffected - custrestoed, and display tht total.
I added link to excel file. http://goo.gl/ioo9hg
Thanks
Ok, now that correct sql is in question text, try this:
SELECT TRUNC(STARTDATETIME, 'HH24') AS hour,
COUNT(*) AS TotalCalls,
Sum(case when RESTOREDDATETIME is null Then 0 else 1 end) RestoredCount,
Sum(CUSTAFFECTED) as CUSTAFFECTED,
Sum(case when RESTOREDDATETIME is null Then 0 else CUSTAFFECTED end) CustRestored,
SUM(CUSTAFFECTED) -
Sum(case when RESTOREDDATETIME is null Then 0 else CUSTAFFECTED end) AS CUSTNotRestored
FROM some_table
where STARTDATETIME >= To_Date('09-12-2013 00:00:00','MM-DD-YYYY HH24:MI:SS')
and STARTDATETIME <= To_Date('09-13-2013 00:00:00','MM-DD-YYYY HH24:MI:SS')
GROUP BY TRUNC(STARTDATETIME, 'HH24')
I recently needed to do this and had to play with it some to get it to work.
The challenge is to get the results of one query to link over to another query all inside the same query and then manipulate the returned value of a field so that the value in a given field in one query's resultset, call it FieldA, is subtracted from the value in a field in a different resultset, call it FieldB. It doesn't matter if the subject values are the result of an aggregation function like COUNT(...); they could be any numeric field in a resultset needing grouping or not. Looking at values from aggregation functions just means you need to adjust your query logic to use GROUP BY for the proper fields. The approach requires creating in-line views in the query and using those as the source of data for doing the subtraction.
A red herring when dealing with this kind of thing is the MINUS operator (assuming you are using an Oracle database) but that will not work since MINUS is not about subtracting values inside a resultset's field values from one another, but subtracting one set of matching records found in another set of records from the final result set returned from the query. In addition, MINUS is not a SQL standard operator so your database probably won't support it if it isn't Oracle you are using. Still, it's awfully nice to have around when you need it.
OK, enough prelude. Here's the query form you will want to use, taking for example a date range we want grouped by YYYY-MM:
select inlineview1.year_mon, (inlineview1.CNT - inlineview2.CNT) as finalcnt from
(SELECT TO_CHAR(*date_field*, 'YYYY-MM') AS year_mon, count(*any_field_name*) as CNT
FROM *schemaname.tablename*
WHERE *date_field* > TO_DATE('*{a year}-{a month}-{a day}*', 'YYYY-MM-DD') and
*date_field* < TO_DATE('*{a year}-{a month}-{a day}*', 'YYYY-MM-DD') and
*another_field* = *{value_of_some_kind}* -- ... etc. ...
GROUP BY TO_CHAR(*date_field*, 'YYYY-MM')) inlineview1,
(SELECT TO_CHAR(*date_field*, 'YYYY-MM') AS year_mon, count(*any_field_name*) as CNT
FROM *schemaname.tablename*
WHERE *date_field* > TO_DATE('*{a year}-{a month}-{a day}*', 'YYYY-MM-DD') and
*date_field* < TO_DATE('*{a year}-{a month}-{a day}*', 'YYYY-MM-DD') and
*another_field* = *{value_of_some_kind}* -- ... etc. ...
GROUP BY TO_CHAR(*date_field*, 'YYYY-MM')) inlineview2
WHERE
inlineview1.year_mon = inlineview2.year_mon
order by *either or any of the final resultset's fields* -- optional
A bit less abstractly, an example wherein a bookseller wants to see the net number of books that were sold in any given month in 2013. To do this, the seller must subtract the number of books retruned for refund from the number sold. He does not care when the book was sold, as he feels a returned book represents a loss of a sale and income statistically no matter when it occurs vs. when the book was sold. Example:
select bookssold.year_mon, (bookssold.CNT - booksreturned.CNT) as netsalescount from
(SELECT TO_CHAR(SALE_DATE, 'YYYY-MM') AS year_mon, count(TITLE) as CNT
FROM RETAILOPS.ACTIVITY
WHERE SALE_DATE > TO_DATE('2012-12-31', 'YYYY-MM-DD') and
SALE_DATE < TO_DATE('2014-01-01', 'YYYY-MM-DD') and
OPERATION = 'sale'
GROUP BY TO_CHAR(SALE_DATE, 'YYYY-MM')) bookssold,
(SELECT TO_CHAR(SALE_DATE, 'YYYY-MM') AS year_mon, count(TITLE) as CNT
FROM RETAILOPS.ACTIVITY
WHERE SALE_DATE > TO_DATE('2012-12-31', 'YYYY-MM-DD') and
SALE_DATE < TO_DATE('2014-01-01', 'YYYY-MM-DD') and
OPERATION = 'return'
GROUP BY TO_CHAR(SALE_DATE, 'YYYY-MM')) booksreturned
WHERE
bookssold.year_mon = booksreturned.year_mon
order by bookssold.year_mon desc
Note that to be sure the query returns as expected, the two in-line views must be equijoined based as shown above on some criteria, as in:
bookssold.year_mon = booksreturned.year_mon
or the subtraction of the counted records can't be done on a 1:1 basis, as the query parser will not know which of the records returned with a grouped count value is to be subtracted from which. Failing to specifiy an equijoin condition will yield a Cartesian join result, probably not what you want (though you may inded want that). For example, adding 'booksreturned.year_mon' right after 'bookssold.year_mon' to the returned fields list in the top-level select statement in the above example and eliminating the
bookssold.year_mon = booksreturned.year_mon
criteria in its WHERE clause will produce a working query that does the subtraction calculation on the CNT values for the YYYY-MM values in the first two columns of the resultset and shows them in the third column. Handy to know this if you need it, as it has solid application in business trends analysis if you can compare sales and returns not just within a given atomic timeframe but as compared across such timeframes in a 1:N fashion.

Query assistance please

Given the following table (much simplified for the purposes of this question):
id perPeriod actuals createdDate
---------------------------------------------------------
1 14 22 2011-10-04 00:00:00.000
2 14 9 2011-10-04 00:00:00.000
3 14 3 2011-10-03 00:00:00.000
4 14 5 2011-10-03 00:00:00.000
I need a query that gives me the average daily "actuals" figure. Note, however, that there are TWO RECORDS PER DAY (often more), so I can't just do AVG(actuals).
Also, if the daily "actuals" average exceeds the daily "perPeriod" average, I want to take the perPeriod value instead of the "average" value. Thus, in the case of the first two records: The actuals average for 4th October is (22+9) / 2 = 15.5. And the perPeriod average for the same day is (14 + 14) / 2 = 14. Now, 15.5 is greater than 14, so the daily "actuals" average for that day should be the "perPeriod" average.
Hope that makes sense. Any pointers greatly appreciated.
EDIT
I need an overall daily average, not an average per date. As I said, I would love to just do AVG(actuals) on the entire table, but the complicating factor is that a particular day can occupy more than one row, which would skew the results.
Is this what you want?
First, if the second payperiod average needed to be the average across a different grouping (It doesn't in this case), then you would need to use a subquery like this:
Select t.CreatedDate,
Case When Avg(actuals) < p.PayPeriodAvg
Then Avg(actuals) Else p.PayPeriodAvg End Average
From table1 t Join
(Select CreatedDate, Avg(PayPeriod) PayPeriodAvg
From table1
Group By CreatedDate) as p
On p.CreatedDate = t.CreatedDate
Group By t.CreatedDate, p.PayPeriodAvg
or, in this case, since the PayPeriod Average is grouped on the same thing, (CreatedDate) as the actuals average, you don't need a subquery, so even easier:
Select t.CreatedDate,
Case When Avg(actuals) < Avg(PayPeriod)
Then Avg(actuals) Else Avg(PayPeriod) End Average
From table1 t
Group By t.CreatedDate
with your sample data, both of these return
CreatedDate Average
----------------------- -----------
2011-10-03 00:00:00.000 4
2011-10-04 00:00:00.000 14
SELECT DAY(createdDate), MONTH(createdDate), YEAR(createdDate), MIN(AVG(actuals), MAX(perPeriod))
FROM MyTable
GROUP BY Day(createdDate, MONTH(createdDate), YEAR(createdDate)
Try this out:
select createdDate,
case
when AVG(actuals) > max(perPeriod) then max(perPeriod)
else AVG(actuals)
end
from SomeTestTable
group by createdDate

sql to calculate daily totals minues the previous day's totals

I have a table that has a date, item, and quantity.
I need a sql query to return the totals per day, but the total is the quantity minus the previous day totals. The quantity accumulates as the month goes on. So the 1st could have 5 the 2nd have 12 and the 3rd has 20.
So the 1st adds 5
2nd adds 7 to make 12
3rd adds 8 to make 20.
I've done something like this in the past, but can not find it or remember. I know i'll need a correlated sub-query.
TIA
--
Edit 1
I'm using Microsoft Access.
Date is a datetime field,
item is a text, and
quantity is number
--
Edit 2
Ok this is what i have
SELECT oos.report_date, oos.tech, oos.total_cpe, oos_2.total_cpe
FROM oos INNER JOIN (
SELECT oos_2.tech, Sum(oos_2.total_cpe) AS total_cpe
FROM oos_2
WHERE (((oos_2.report_date)<#10/10/2010#))
GROUP BY oos_2.tech
) oos_2 ON oos.tech = oos_2.tech;
How do i get the oos.report_date into where i says #10/10/2010#. I thought I could just stick it in there like mysql, but no luck. I'm gonna continue researching.
Sum them by adding one to the date and making the value negative, thus taking yesterday's total from today's:
SELECT report_date, tech, Sum(total_cpe) AS total_cpe
FROM (
SELECT oos.report_date, oos.tech, oos.total_cpe
FROM oos
UNION ALL
SELECT oos.report_date+1, oos.tech, 0-oos.total_cpe
FROM oos
)
WHERE (report_date < #10/10/2010#)
GROUP BY report_date, tech
ORDER BY report_date, tech
Ok, I figured it out.
SELECT o.report_date, o.tech, o.total_cpe,
o.total_cpe - (
SELECT IIf(Sum(oos.total_cpe) is null, 0,Sum(oos.total_cpe)) AS total_cpe
FROM oos
WHERE (((oos.tech)=o.tech) AND ((oos.report_date)<o.report_date))
) AS total
FROM oos o;