SQL: Find number of active "events" each month - sql

Background
I have an SQL table that contains all events, with each event containing a unique identifier.
As you can see for some IDs the "event" stretches across multiple months. What I'm trying to find is the number of "active events" per month.
For example event ID:342, is active in both the month of Jan and Feb. So it should count towards both Jan and Feb's final count.
Example dataset
ID
Start Date
End Date
342
01 Jan 2022
12 Feb 2022
231
12 Feb 2022
26 Feb 2022
123
20 Jan 2022
10 Apr 2022
Desired output:
Month
Start Date
Jan
2
Feb
3
Mar
1
Apr
1
btw: I'm using Alibaba's ODPS SQL and not MySQL or Postgres. So i appreciate if the solution provided could be SQL system agnostic. Thanks!

Here is an example is MySQL 8, using a recursive CTE to construct the list of months. It would be more efficient to use a Calendar Table.
If you are not using MySQL you will need to modify the syntax of the query.
create table dataset(
ID int, Start_date Date,End_date Date);
insert into dataset values
(342,'2022-01-01','2022-02-12'),
(231,'2022-01-12','2022-02-26'),
(123,'2022-01-20','2022-04-10');
/*
Desired output:
Month Start Date
Jan 2
Feb 3
Mar 1
Apr 1
*/
✓
✓
✓
select
min(month(Start_date)),
max(month(End_date))
from dataset;
min(month(Start_date)) | max(month(End_date))
---------------------: | -------------------:
1 | 4
with recursive m as
(select min(month(Start_date)) mon from dataset
union all
select mon + 1 from m
where mon < (select max(month(End_date)) from dataset)
)
select
mon "month",
count(id) "Count"
from m
left join dataset
on month(Start_date)<= mon
and month(End_date) >= mon
group by mon
order by mon;
month | Count
----: | ----:
1 | 3
2 | 3
3 | 1
4 | 1
db<>fiddle here

Related

How to get 100 records from a table with a value from year 2022 more than from year 2021

Here is the code that is working, but can it be more efficient and avoid subquery?
condition used: (revenue_2022 - revenue_2021) > 0
or revenue_2022 > revenue_2021
select
id
from
main_tbl
where
(
(
select
revenue
from
main_tbl
where
id = ts.id
and bq_year = 2022
and revenue > 0
) -
(
select
revenue
from
main_tbl
where
id = ts.id
and year = 2021
and revenue > 0
)
) > 0
limit 100
main_table:
id | revenue | year
----------------------
1 | 4500 | 2022
1 | 4600 | 2021
2 | 3300 | 2022
3 | 5800 | 2022
3 | 5500 | 2021
expected output is the id 3 since its revenue for the year 2022 is greater than the year 2021
And 2 is not considered, since it doesn't have the year 2021 to compare with.
It's a little unclear what you'd like to do if there are other years, or more than one entry per id+year, but you could do something like:
select
id
from main_table y2022, main_table y2021
where y2022.year = 2022
and y2021.year = 2021
and y2022.id = y2021.id
and y2022.revenue > y2021.revenue
limit 100

Using a window function in BigQuery to create running sum of active quarters

I am working to enhance a dataset by creating a column that would allow me to track how many active quarters a given company has had for a given row. A company is "active" if they recognize revenue within that quarter.
Each row of my dataset represents one month's performance for a single company.
I have been able to use a WINDOW function to create a running sum for active months successfully:
COUNTIF(Revenue IS NOT NULL) OVER
(partition by Company_Name ORDER BY month_end ASC ROWS BETWEEN unbounded preceding and current row) AS cumulative_active_months
I am now struggling to convert my logic to count the quarters rather than the months.
This is a rough idea of what my table currently looks like.
Row Month Month_end Fiscal_Quarter Company_Name Revenue Active month count
----- ------- ------------ ---------------- -------------- --------- --------------------
1 Jul 2016-07-31 FY17-Q2 Foo x,xxx 1
2 Jul 2016-07-31 FY17-Q2 Bar xxx,xxx 1
3 Aug 2016-08-31 FY17-Q2 Foo xx,xxx 2
4 Aug 2016-08-31 FY17-Q2 Bar xxx 2
5 Sep 2016-09-30 FY17-Q2 Foo xx 3
6 Sep 2016-09-30 FY17-Q2 Bar x,xxx 3
7 Oct 2016-10-31 FY17-Q3 Foo xx 4
8 Oct 2016-10-31 FY17-Q3 Bar Null 3
This what ideally I'd like for my table to look like.
Row Month Month_end Fiscal_Quarter Company_Name Revenue Active month count Active quarter count
----- ------- ------------ ---------------- -------------- --------- -------------------- ----------------------
1 Jul 2016-07-31 FY17-Q2 Foo x,xxx 1 1
2 Jul 2016-07-31 FY17-Q2 Bar xxx,xxx 1 1
3 Aug 2016-08-31 FY17-Q2 Foo xx,xxx 2 1
4 Aug 2016-08-31 FY17-Q2 Bar xxx 2 1
5 Sep 2016-09-30 FY17-Q2 Foo xx 3 1
6 Sep 2016-09-30 FY17-Q2 Bar x,xxx 3 1
7 Oct 2016-10-31 FY17-Q3 Foo xx 4 2
8 Oct 2016-10-31 FY17-Q3 Bar Null 3 1
If this is counting active months:
COUNTIF(Revenue IS NOT NULL) OVER (PARTITION BY Company_Name ORDER BY month_end ASC) AS cumulative_active_months
Then this is the corresponding count for quarters would use COUNT(DISTINCT):
COUNT(DISTINCT CASE WHEN Revenue IS NOT NULL THEN Fiscal_Quarter END) OVER (PARTITION BY Company_Name ORDER BY month_end ASC) AS cumulative_active_quarters
Unfortunately, BigQuery does not support this, so you can use a subquery and cumulative sum:
select t.* except (seqnum),
countif(seqnum = 1) over (partition by company_name order by month_end) as cnt
from (select t.*,
(case when revenue is not null
then row_number() over (partition by Company_Name, Fiscal_Quarter order by month_end)
else 0
end) as seqnum
from t
) t;
Note: This does not count the current quarter until there is revenue, which I think makes sense.

SUM and Count in one SQL Query

I have this kind of data
time Members
-------------------------------------------------- -----------
Jun 23 2016 1
Jun 23 2016 1
Jun 23 2016 2
Jun 29 2016 6
Jul 11 2016 3
Jul 11 2016 1
Jul 13 2016 1
I obtained this data using this sql query
SELECT CONVERT (VARCHAR(12), a.registered_time), COUNT(b.member_id) AS Members
FROM b
Inner JOIN a ON b.mirror_id = a.mirror_id
GROUP BY
(a.registered_time) order by a.registered_time
I want to get the sum of total numbers if they are of the same date for exampple the date of June 23 2016 will have total members of 4 and so on. Is it possible to have SUM() FUnction on Count()? How can I do this?
Convert the value to a date and include that in both the select and group by:
SELECT CONVERT(date, a.registered_time) as dte, COUNT(b.member_id) AS Members
FROM b JOIN
a
ON b.mirror_id = a.mirror_id
GROUP BY CONVERT(date, a.registered_time)
ORDER BY CONVERT(date, a.registered_time);

Select Every Date for Date Range and Insert

Using SQL Server 2008
I have a table A which has start date, end date and value. For each date within the start date and end date in Table A, I need to insert (or update if already exists) that date in table B such that the value in this table is value in A/DateDiff(Day,StartDate of A,EndDate of A).
Example:
Table A
ID StartDate EndDate Value
1 01 Jan 2014 03 Jan 2014 33
2 01 Feb 2014 02 Feb 2014 20
3 02 Jan 2014 03 Jan 2014 10
Table B
ID Date Value
1 01 Jan 2014 11
2 02 Jan 2014 16
3 03 Jan 2014 16
4 01 Feb 2014 10
5 02 Feb 2014 10
The way values are computed are - For ID 1, there are 3 days which means 11 units per day. So 1st, 2nd, 3rd Jan all get 11 units. Then because there are additional units with date range 2nd Jan to 3rd Jan which amount to 5 units per day, 2nd and 3rd Jan will be (11+5) 16. 1st and 2nd Feb just have one record so they will simply be 20/2 = 10.
I can think of a solution using loops, but want to avoid it entirely.
Is there any way I can achieve this through a set based solution? It is important for me to do this in bulk using set based approach.
I am trying to read through various articles and seems like CTE, Calendar Table or Tally Table might help but the examples I have seen require setting variables and passing start date and end date which I think will work for single record but not when doing all records at a time. Please suggest.
Thanks!
I think this should do it (DEMO):
;with cte as (
select
id
,startdate
,enddate
,value / (1+datediff(day, startdate, enddate)) as value
,startdate as date
from units
union all
select id, startdate, enddate, value, date+1 as date
from cte
where date < enddate
)
select
row_number() over (order by date) as ID
,date
,sum(value) as value
from cte
group by date
The idea is to use a Recursive CTE to explode the date ranges into one record per day. Also, the logic of value / (1+datediff(day, startdate, enddate)) distributes the total value evenly over the number of days in each range. Finally, we group by day and sum together all the values corresponding to that day to get the output:
| ID | DATE | VALUE |
|----|---------------------------------|-------|
| 1 | January, 01 2014 00:00:00+0000 | 11 |
| 2 | January, 02 2014 00:00:00+0000 | 16 |
| 3 | January, 03 2014 00:00:00+0000 | 16 |
| 4 | February, 01 2014 00:00:00+0000 | 10 |
| 5 | February, 02 2014 00:00:00+0000 | 10 |
From here you can join with your result table (Table B) by date, and update/insert the value as needed. That logic might look something like this (test it first of course before running in production!):
update B set B.VALUE = R.VALUE from TableB B join Result R on B.DATE = R.DATE
insert TableB (DATE, VALUE)
select DATE, VALUE from Result R where R.DATE not in (select DATE from TableB)

Getting a variable end of year date and value from MS Access table using SQL

I have some data is that is daily (day on day) closing figures for a tracked supply and is in one MS Access table that has 2 columns - Dates (the date), PXLast(the day's closing figure)).
I have daily data from Jan 1991 to Aug 2013 and I wanted to get the percentage change of PXLast at every year end compared to last year year end as follows:
Year | Percentage Change of PXLast(Year on Year)
1991 | 15.2%
1992 | 9.2%
The year end date varies (not always 31st ) and I am going about getting the last PXLast value by:
1.Get the max date in Dec every year: results in MyYear, MyMonth, MyDay
2.Combine it using DateSerial(MyYear, MyMonth, MyDay)
3.Join the resulting query to the table and inner join on the date column
4.Get the PXLast value
SELECT EndDates.EndDates, NSE20.PX_LAST AS LookPoint
FROM NSE20 INNER JOIN
(SELECT DateSerial([MyYear],[MyMonth],[MyDay])
AS EndDates FROM (SELECT 12 AS MyMonth, MyDay, MyYear FROM
(SELECT Max(Day([Dates])) AS MyDay, Year([Dates]) AS MyYear
FROM NSE20 WHERE (((Month([Dates]))=12))
GROUP BY Year([Dates])) AS EndYearValues)
AS EndValueDates)
AS EndDates ON NSE20.Dates = EndDates.EndDates;
Could anyone assist me get the corresponding value using a query for previous year end
eg for 29 Dec 2006, it should show the current value and show the value for 31 Dec 2005
in the same row ie
Year | Current Year End| Previous Year End
2005 | 3449.00 | 4611.19
2006 | 9.2% |3449.00
Any help is appreciated.
Any suggestions to a better way of doing this is very very welcome....
Let's assume that you have some test data in a table named [NSE20] that looks like this
Dates PXLast
---------- ------
2010-07-01 131
2010-12-31 130
2011-11-12 123
2011-12-30 125
2012-01-03 127
2012-12-31 129
I'd start by creating a saved query in Access named [NSE20_year_ends] that identifies the year-end dates by (calendar) year:
SELECT Year(Dates) AS CalendarYear, Max(Dates) AS YearEndDate
FROM NSE20
GROUP BY Year(Dates)
That will produce
CalendarYear YearEndDate
------------ -----------
2010 2010-12-31
2011 2011-12-30
2012 2012-12-31
Then I'd create another saved query named [NSE20_year_end_balances] to extract the closing balances for each year:
SELECT NSE20_year_ends.CalendarYear, NSE20.PXLast
FROM
NSE20
INNER JOIN
NSE20_year_ends
ON NSE20.Dates = NSE20_year_ends.YearEndDate
That will give us
CalendarYear PXLast
------------ ------
2010 130
2011 125
2012 129
Now we can do a self-join on that query to calculate the percentage change
SELECT
y1.CalendarYear,
(y1.PXLast - y0.PXLast) / y0.PXLast * 100 AS PctChange
FROM
NSE20_year_end_balances y1
INNER JOIN
NSE20_year_end_balances y0
ON y0.CalendarYear = y1.CalendarYear - 1
resulting in
CalendarYear PctChange
------------ -----------------
2011 -3.84615384615385
2012 3.2