counting events over flexible ranges - sql

I am trying to count events (which are rows in the event_table) in the year before and the year after a particular target date for each person. For example, say I have a person 100 and target date is 10/01/2012. I would like to count events in 9/30/2011-9/30/2012 and in 10/02/2012-9/30/2013.
My query looks like:
select *
from (
select id, target_date
from subsample_table
) as i
left join (
select id, event_date, count(*) as N
, case when event_date between target_date-365 and target_date-1 then 0
when event_date between target_date+1 and target_date+365 then 1
else 2 end as after
from event_table
group by id, target_date, period
) as h
on i.id = h.id
and i.target_date = h.event_date
The output should look something like:
id target_date after N
100 10/01/2012 0 1000
100 10/01/2012 1 0
It's possible that some people do not have any events in the before or after periods (or both), and it would be nice to have zeros in that case. I don't care about the events outside the 730 days.
Any suggestions would be greatly appreciated.

I think the following may approach what you are trying to accomplish.
select id
, target_date
, event_date
, count(*) as N
, SUM(case when event_date between target_date-365 and target_date-1
then 1
else 0
end) AS Prior_
, SUM(case when event_date between target_date+1 and target_date+365
then 1
else 0
end) as After_
from subsample_table i
left join
event_table h
on i.id = h.id
and i.target_date = h.event_date
group by id, target_date, period

This is a generic answer. I don't know what date functions teradata has, so I will use sql server syntax.
select id, target_date, sum(before) before, sum(after) after, sum(righton) righton
from yourtable t
join (
select id, target_date td
, case when yourdate >= dateadd(year, -1, target_date)
and yourdate < target_date then 1 else 0 end before
, case when yourdate <= dateadd(year, 1, target_date)
and yourdate > target_date then 1 else 0 end after
, case when yourdate = target_date then 1 else 0 end righton
from yourtable
where whatever
group by id, target_date) sq on t.id = sq.id and target_date = dt
where whatever
group by id, target_date
This answer assumes that an id can have more than one target date.

Related

CASE WHEN condition with MAX() function

There are a lot questions on CASE WHEN topic, but the closest my question is related to this How to use CASE WHEN condition with MAX() function query which has not been resolved.
Here is some of my sample data:
date
debet
2022-07-15
57190.33
2022-07-14
815616516.00
2022-07-15
40866.67
2022-07-14
1221510.00
So, I want to all records for the last two dates and three additional columns: sum(sales) for the previous day, sum for the current day and the difference between them:
SELECT
[debet],
[date] ,
SUM( CASE WHEN [date] = MAX(date) THEN [debet] ELSE 0 END ) AS sum_act,
SUM( CASE WHEN [date] = MAX(date) - 1 THEN [debet] ELSE 0 END ) AS sum_prev ,
(
SUM( CASE WHEN [date] = MAX(date) THEN [debet] ELSE 0 END )
-
SUM( CASE WHEN [date] = MAX(date) - 1 THEN [debet] ELSE 0 END )
) AS diff
FROM
Table
WHERE
[date] = ( SELECT MAX(date) FROM Table WHERE date < ( SELECT MAX(date) FROM Table) )
OR
[date] = ( SELECT MAX(date) FROM Table WHERE date = ( SELECT MAX(date) FROM Table ) )
GROUP BY
[date],
[debet]
Further, of course, it informs that I can't use the aggregate function inside CASE WHEN. Now I use this combination: sum(CASE WHEN [date] = dateadd(dd,-3,cast(getdate() as date)) THEN [debet] ELSE 0 END). But here every time I need to make an adjustment for weekends and holidays. The question is, is there any other way than using 'getdate' in 'case when' Statement to get max date?
Expected result:
date
sum_act
sum_prev
diff
2022-07-15
97190.33
0.00
97190.33
2022-07-14
0.00
508769.96
-508769.96
You can use dense_rank() to filter the last 2 dates in your table. After that you can use either conditional case expression with sum() to calculate the required value
select [date],
sum_act = sum(case when rn = 1 then [debet] else 0 end),
sum_prev = sum(case when rn = 2 then [debet] else 0 end),
diff = sum(case when rn = 1 then [debet] else 0 end)
- sum(case when rn = 2 then [debet] else 0 end)
from
(
select *, rn = dense_rank() over (order by [date] desc)
from tbl
) t
where rn <= 2
group by [date]
db<>fiddle demo
Two steps:
Get the sums for the last three dates
Show the results for the last two dates.
Well, we could also get all daily sums in step 1, but we just need the last three in order to calculate the sums for the last two days, so why aggregate more data than necessary?
Here is the query. You may have to put the date column name in brackets in SQL Server, as date is a keyword in SQL.
select top(2)
date,
sum_debit_current,
sum_debit_previous,
sum_debit_current - sum_debit_previous as diff
(
select
date,
sum(debet) as sum_debit_current,
lag(sum(debet)) over (order by date) as sum_debit_previous
from table
where date in (select distinct top(3) date from table order by date desc)
group by date
)
order by date desc;
(SQL Server uses TOP(n) instead of standard SQL FETCH FIRST 3 ROWS and while SELECT DISTINCT TOP(3) date looks like "get the top 3 rows, then apply distinct on their date", it is really "apply distinct on the dates, then get the top 3" like in standard SQL.)

Sql optimization query with join on Presto

I have the following query which shows out of all the clicks on products how many no of products have >1 image. The queries separately work fine but when combined its not able to execute within 3m threshold time. Any inputs how can this be optimized best.
select DATE (DATE_TRUNC('week',dt)) AS wk_dt,COUNT(DISTINCT a.product_id) tot_prods,COUNT(DISTINCT b.product_id) multiimp_prods,count(a.product_id) AS total_clicks,count(case when b.product_id>0 then a.product_id END) AS total_mulimg_clicks
from (
select product_id,DATE(from_unixtime((time/1000)+19800)) as dt--,count(distinct_id) clicks
from silver.mixpanel_android__product_clicked
WHERE DATE(from_unixtime((time/1000)+19800)) BETWEEN date(date_trunc('week',cast(Current_date - interval '14' day AS date))) AND Current_date
GROUP BY 1,2) a --2,471,245 1,458,476
LEFT join (
SELECT product_id,tot_img
FROM (SELECT DISTINCT product_id, catalog_id,(a + b + c + d + e) AS tot_img
FROM (
SELECT product_id,catalog_id,
CASE WHEN images is NULL then 0 else 1 end as a,
CASE WHEN img_2 is NULL then 0 else 1 end as b,
CASE WHEN img_3 is NULL then 0 else 1 end as c,
CASE WHEN img_4 is NULL then 0 else 1 end as d,
CASE WHEN img_5 is NULL then 0 else 1 end as e
FROM (
SELECT DISTINCT id AS product_id,catalog_id,
TRIM(images) AS images,
TRIM(SPLIT_PART(images,',',2)) AS "img_2",
TRIM(SPLIT_PART(images,',',3)) AS "img_3",
TRIM(SPLIT_PART(images,',',4)) AS "img_4",
TRIM(SPLIT_PART(images,',',5)) AS "img_5"
FROM silver.supply__products --WHERE date(created) <= date(date_trunc('week',cast(Current_date - interval '14' day AS date)))
)
--GROUP BY 1,2
order by 6 asc
)
)
WHERE tot_img>1
) b
on a.product_id=b.product_id
GROUP BY 1

MSSQL Group by and Select rows from grouping

I'm trying to figure out if what I'm trying to do is possible. Instead of resorting to multiple queries on a table, I wanted to group the records by business date and id then group by the id and select one date for a field and another date for the other field.
SELECT
*
{AMOUNT FROM DATE}
{AMOUNT FROM OTHER DATE}
FROM (
SELECT
date,
id,
SUM(amount) AS amount
FROM
table
GROUP BY id, date
AS subquery
GROUP BY id
It seems that you're looking to do a pivot query. I usually use cross tabs for this. Based on the query you posted, it could look like:
SELECT
id,
SUM(CASE WHEN date = '20190901' THEN amount ELSE 0 END) AmountFromSept01,
SUM(CASE WHEN date = '20191001' THEN amount ELSE 0 END) AmountFromOct01
FROM (
SELECT
date,
id,
SUM(amount) AS amount
FROM
table
GROUP BY id, date
)AS subquery
GROUP BY id;
You could also use a CTE.
WITH CTE AS(
SELECT
date,
id,
SUM(amount) AS amount
FROM
table
GROUP BY id, date
)
SELECT
id,
SUM(CASE WHEN date = '20190901' THEN amount ELSE 0 END) AmountFromSept01,
SUM(CASE WHEN date = '20191001' THEN amount ELSE 0 END) AmountFromOct01
FROM CTE
GROUP BY id;
Or even be a rebel and do the operation directly.
SELECT
id,
SUM(CASE WHEN date = '20190901' THEN amount ELSE 0 END) AmountFromSept01,
SUM(CASE WHEN date = '20191001' THEN amount ELSE 0 END) AmountFromOct01
FROM CTE
GROUP BY id;
However, some people have tested for performance and found that pre-aggregating can improve performance.
If I understand you correctly, then you're just trying to pivot, but only with two particular dates:
select id,
date1 = sum(iif(date = '2000-01-01', amount, null)),
date2 = sum(iif(date = '2000-01-02', amount, null))
from [table]
group by id

sql join and group by generated date range

I have Table1 and I need a query to populate Table2:
Problem here is with Date column. I want to know the process of location/partner combination per day. Main issue here is that I can't pick DateCreated and make it as default date since it doesn't necessarily cover whole date range, like in this example where it doesn't have 2015-01-07 and 2015-01-09. Same case with other dates.
So, my idea is to first select dates from some table which contains needed date range and then perform calculation for each day/location/partner combination from cte but in that case I can't figure out how to make a join for LocationId and PartnerId.
Columns:
Date - CreatedItems - number of created items where Table1.DateCreated = Table2.Date
DeliveredItems - number of delivered items where Table1.DateDateOut = Table2.Date
CycleTime - number of days delivered item was in the location (DateOut - DateIn + 1)
I started with something like this but it's very like that I completely missed the point with it:
with d as
(
select date from DimDate
where date between DATEADD(DAY, -365, getdate()) and getdate()
),
cr as -- created items
(
select
DateCreated,
LocationId,
PartnerId,
CreatedItems = count(*)
from Table1
where DateCreated is not null
group by DateCreated,
LocationId,
PartnerId
),
del as -- delivered items
(
select
DateOut,
LocationId,
ParnerId,
DeliveredItems = count(*),
CycleTime = DATEDIFF(Day, DateOut, DateIn)
from Table1
where DateOut is not null
and Datein is not null
group by DateOut,
LocationId,
PartnerId
)
select
d.Date
from d
LEFT OUTER JOIN cr on cr.DateCreated = d.Date -- MISSING JOIN PER LocationId and PartnerId
LEFT OUTER JOIN del on del.DateCompleted = d.Date -- MISSING JOIN PER LocationId and PartnerId
with range(days) as (
select 0 union all select 1 union all select 2 union all
select 3 union all select 4 union all select 5 union all
select 6 /* extend as necessary */
)
select dateadd(day, r.days, t.DateCreated) as "Date", locationId, PartnerId,
sum(
case
when dateadd(day, r.days, t.DateCreated) = t.DateCreated
then 1 else 0
end) as CreatedItems,
sum(
case
when dateadd(day, r.days, t.DateCreated) = t.Dateout
then 1 else 0
end) as DeliveredItems,
sum(
case
when dateadd(day, r.days, t.DateCreated) = t.Dateout
then datediff(days, t.DateIn, t.DateOut) + 1 else 0
end) as CycleTime
from
<yourtable> as t
inner join range as r
on r.days between 0 and datediff(day, t.DateCreated, t.DateOut)
group by dateadd(day, r.days, t.DateCreated), LocationId, PartnerId;
If you only want the end dates (rather than all the dates in between) this is probably a better approach:
with range(dt) as (
select distinct DateCreated from T union
select distinct DateOut from T
)
select r.dt as "Date", locationId, PartnerId,
sum(
case
when r.dt = t.DateCreated
then 1 else 0
end) as CreatedItems,
sum(
case
when r.dt = t.Dateout
then 1 else 0
end) as DeliveredItems,
sum(
case
when r.dt = t.Dateout
then datediff(days, t.DateIn, t.DateOut) + 1 else 0
end) as CycleTime
from
<yourtable> as t
inner join range as r
on r.dt in (t.DateCreated, t.DateOut)
group by r.dt, LocationId, PartnerId;
If to specify WHERE clause? Something Like that:
WHERE cr.LocationId = del.LocationId AND
cr.PartnerId = del.PartnerId

change rows to columns and count

how to calculate count based on rows?
SOURCE TABLE
each employee can take 2 days off
Employee-----First_Day_Off-----Second_Day_Off
1------------10/21/2009--------12/6/2009
2------------09/3/2009--------12/6/2009
3------------09/3/2009--------NULL
4
5
.
.
.
Now i need a table that shows the dates and number of people taking off on that day
Date---------First_Day_Off-------Second_Day_Off
10/21/2009---1-------------------0
12/06/2009---1--------------------1
09/3/2009----2--------------------0
Any ideas?
Oracle 9i+, using Subquery Factoring (WITH):
WITH sample AS (
SELECT a.employee,
a.first_day_off AS day_off,
1 AS day_number
FROM YOUR_TABLE a
WHERE a.first_day_off IS NOT NULL
UNION ALL
SELECT b.employee,
b.second_day_off,
2 AS day_number
FROM YOUR_TABLE b
WHERE b.second_day_off IS NOT NULL)
SELECT s.day_off AS date,
SUM(CASE WHEN s.day_number = 1 THEN 1 ELSE 0 END) AS first_day_off,
SUM(CASE WHEN s.day_number = 2 THEN 1 ELSE 0 END) AS second_day_off
FROM sample s
GROUP BY s.day_off
Non Subquery Version
SELECT s.day_off AS date,
SUM(CASE WHEN s.day_number = 1 THEN 1 ELSE 0 END) AS first_day_off,
SUM(CASE WHEN s.day_number = 2 THEN 1 ELSE 0 END) AS second_day_off
FROM (SELECT a.employee,
a.first_day_off AS day_off,
1 AS day_number
FROM YOUR_TABLE a
WHERE a.first_day_off IS NOT NULL
UNION ALL
SELECT b.employee,
b.second_day_off,
2 AS day_number
FROM YOUR_TABLE b
WHERE b.second_day_off IS NOT NULL) s
GROUP BY s.day_off
It is a bit awkward to handle these queries, since you have days off stored in different columns. A better layout would be to have something like
EMPLOYEE_ID DAY_OFF
Then you would have multiple rows if an employee took multiple days off
EMPLOYEE_ID DAY_OFF
1 10/21/2009
1 12/6/2009
2 09/3/2009
2 12/6/2009
3 09/3/2009
...
In that case, you could find out how many days off each person took by using the following query:
SELECT EMPLOYEE_ID, COUNT(*) AS NUM_DAYS_OFF FROM DAYS_OFF_TABLE GROUP BY EMPLOYEE_ID
And the number of people who took days off on each date like this:
SELECT DAY_OFF, COUNT(*) AS NUM_PEOPLE FROM DAYS_OFF_TABLE GROUP BY DAY_OFF
But I digress...
You can try to use an SQL CASE statement to help with this:
SELECT Employee, CASE
WHEN First_Day_Off is NULL AND Second_Day_Off is NULL THEN 0
WHEN First_Day_Off is NOT NULL AND Second_Day_Off is NULL THEN 1
WHEN First_Day_Off is NULL AND Second_Day_Off is NOT NULL THEN 1
ELSE 2
END AS NUM_DAYS_OFF
FROM DAYS_OFF_TABLE
(note that you may need to change around the syntax slightly depending on your database.
Getting dates and number of people who took off on that day might be more complicated.
I don't know if this would work, but you can try it:
SELECT
Date_Off,
COUNT(*) AS Num_People
FROM
(SELECT
First_Day_Off, COUNT(*) AS Num_People FROM DAYS_OFF_TABLE WHERE First_Day_Off IS NOT NULL GROUP BY First_Day_Off
UNION
SELECT Second_Day_Off, COUNT(*) AS Num_People FROM DAYS_OFF_TABLE WHERE Second_Day_Off IS NOT NULL GROUP BY Second_Day_Off)
GROUP BY
Num_People