SQL window-over by-incremental distinct users

SQL window-over by-incremental distinct users - sql

I am sure this must be fairly easy for you but unfortunately it is not for me !
I am trying to write a query that counts incremental distinct user id grouped by month.
Understand if user X has a row in both january and february he should be counted as 1 in January but not in February.
I can do the following below for a given month but I would like to automate it
EDIT :
Let me try to clarify: a row in table UX is created every time a user performs a given action. I would like to count the number of unique NEW(/incremental) users every month who performed this action. Meaning if user A performed this action in January AND February he would only be counted in January.
select
count(distinct ux.account_id)
, trunc(ux.date_key,'MM') as month
from
ux
left join
(
select
distinct ux.account_id as account_id
from
ux
where
and ux.date_key < '2019-02-01'
) bf on ux.account_id=bf.account_id
where
and ux.date_key >= '2019-02-01'
and bf.account_id IS NULL
group by
trunc(ux.date_key,'MM')

"incremental distinct user" to me sounds a lot like a user starting. Does this do what you want?
select trunc(min_date_key, 'MM') as month
from (select account_id, min(ux.date_key) as min_date_key
from ux
group by account_id
) ux
where min_date_key < '2019-02-01' and ux.account_id IS NULL
group by trunc(ux.date_key, 'MM')

Related

Data value on a given date

This time I have a table on a PostgreSQL database that contains the employee name, the date that he started working and the date that he leaves the company, in the cases of the employee still remains in the company, this field has null value.
Knowing this, I would like to know how many people was working on a predetermined date, ex:
I would like to know how many people works on the company in January 2021.
I don't know where to start, in some attempts I got the number of hires and layoffs per month, but I need to show this accumulated value per month, in another column.
I hope I made myself understood, I'll leave the last SQL I got here.
select reference, sum(hires) from
(
select
date_trunc('month', date_hires) as reference,
count(*) as hires
from
ponto_mais_relatorio_colaboradores
group by
date_hires
union all
select
date_trunc('month', date_layoff) as reference,
count(*)*-1 as layoffs
from
ponto_mais_relatorio_colaboradores
group by
date_layoff
) as reference
join calendar_aux on calendar_aux.ano_mes = reference
group by reference
order by reference

Break the requirement down. The question: how many are employed on any given date? That would include all hired before that date and do not have a layoff date plus all hired before with a layoff date later then the date your interested period. I.e you are interested in Jan so you still want to count an employee with a layoff date in Feb. With that in place convert into SQL. The preceding is available from select comparing dates. other issue is that Jan is not a date, it is a range of dates, so you need each date. You can use generate series to create each day in Jan. Then Join the generated dates with and selection from your table. Resulting query:
with jan_dates( jdate ) as
( select generate_series( date '2021-01-01'
, date '2021-01-31'
, interval '1' day
)::date
)
select jdate "Date", count(*) "Employees"
from jan_dates j
join employees e
on ( e.date_hires <= j.jdate
and ( e.date_layoff is null
or e.date_layoff > j.jdate
)
)
group by j.jdate
order by j.jdate;
Note: Not tested.

PL-SQL query to calculate customers per period from start and stop dates

I have a PL-SQL table with a structure as shown in the example below:
I have customers (customer_number) with insurance cover start and stop dates (cover_start_date and cover_stop_date). I also have dates of accidents for those customers (accident_date). These customers may have more than one row in the table if they have had more than one accident. They may also have no accidents. And they may also have a blank entry for the cover stop date if their cover is ongoing. Sorry I did not design the data format, but I am stuck with it.
I am looking to calculate the number of accidents (num_accidents) and number of customers (num_customers) in a given time period (period_start), and from that the number of accidents-per-customer (which will be easy once I've got those two pieces of information).
Any ideas on how to design a PL-SQL function to do this in a simple way? Ideally with the time periods not being fixed to monthly (for example, weekly or fortnightly too)? Ideally I will end up with a table like this shown below:
Many thanks for any pointers...

You seem to need a list of dates. You can generate one in the query and then use correlated subqueries to calculate the columns you want:
select d.*,
(select count(distinct customer_id)
from t
where t.cover_start_date <= d.dte and
(t.cover_end_date > d.date + interval '1' month or t.cover_end_date is null)
) as num_customers,
(select count(*)
from t
where t.accident_date >= d.dte and
t.accident_date < d.date + interval '1' month
) as accidents,
(select count(distinct customer_id)
from t
where t.accident_date >= d.dte and
t.accident_date < d.date + interval '1' month
) as num_customers_with_accident
from (select date '2020-01-01' as dte from dual union all
select date '2020-02-01' as dte from dual union all
. . .
) d;
If you want to do arithmetic on the columns, you can use this as a subquery or CTE.

Group Data by Year, Oracle SQL

I am trying to create a query that counts records that existed within a year. The table looks like this:
Title_ID ISSUE_DATE EXPIRY_DATE CLIENT_NUMBER
123 '26-JUN-19' '17-AUG-20' 8529
124 '04-APR-19' '17-SEP-22' 8529
125 '09-MAY-15' '11-SEP-19' 3654
126 '31-DEC-19' '25-NOV-22' 9852
127 '27-OCT-18' '26-FEB-21' 2254
128 '05-OCT-11' '01-JAN-19' 9852
Specifically, I want to count the number of distinct CLIENT_NUMBERS of the records that existed in a given calendar year.
The record (title) exists from the ISSUE_DATE until the EXPIRY_DATE. If the record existed at any point within a year (Let's say 2019), then we are interested in including it in our client count.
So, if the record was issued in 2019 or if the record expired in 2019 or if the record was issued before 2019 and expired after 2019, then we are interested in including it in the client count for the year it existed.
I have built the following query that does this, but only for one specific year (2019). I'd like to build the query further so it look at each calendar year and counts the distinct client numbers when the client has an active title:
SELECT *
-- count(distinct client_number)
FROM
TITLE
WHERE
issue_date between '01-Jan-19' and '31-Dec-19'
or expiry_date between '01-Jan-19' and '31-Dec-19'
or (issue_date < '01-Jan-19' and expiry_date > '31-Dec-19')
Where I am having trouble is, my data is much larger than the subset I have provided. I would like to recursively get counts of distinct client numbers by year using the same kind of logic to include a record within a calendar year as I have outlined above. So, I'd like to have a table like this:
YEAR COUNT_OF_CLIENT_NUMBERS
2020 5469
2019 5587
2018 4852
2017 4501
2016 3265
etc
I think I've stretched by current SQL abilities at this point, so I thought Id ask to see if there are any suggestions to make this happen?
Thanks.
EDIT: to clarify, the issue date and the expiry date apply to the title, not the client. So, the title is issued on the issue date and expires on the expiry date. A client can own one or more title(s).
So, I am looking to get a count of how many distinct clients own active titles within a give year if one or more of their titles is active within that year. So the key is, a title is considered active if it was issued in that year OR it expired within that year OR it was issued before that year and expired after that year. A title CAN be active in multiple years (i.e. Issued on Feb. 4, 2014 and expires on Apr.7 2017, I want to include the client count for each year that titles exists....2014, 2015, 2016 and 2017).
So, I created a table to join to (thanks #GMB for the suggestion):
with calendar_year (y) as
(
select 2010 from dual
union all select y + 1 from calendar_year where y < 2020
)
select * from calendar_year
Which returns:
2010
2011
2012
2013
2014
etc
I want to join that to my titles table, but I am having issues recursively looking at the issue date and expiry date to join up the title to each year it existed in. Any help in that area, would be great!

You can use a recursive query to generate the years, then bring the table with a left join, and aggregate:
with dates (dt) as (
select date '2016-01-01' from dual
union all select add_months(dt, 1) from dates where dt < date '2020-01-01'
)
select d.dt, count(distinct t.client_number) count_of_client_numbers
from dates d
left join title t
on t.issue_date <= d.dt
and t.expiry_date > d.dt
group by d.dt
The upside of this approach is that you get results for each and every year, even those where no title started or ended.

You can get number of clients on any day by unpivoting the data, so there is one row per date. Then keep track of the "ins" and "outs".
You don't specify the database, but here is one approach:
select dte, sum(inc),
sum(sum(inc)) over (order by dte) as active_on_date
from ((select issue_date as dte, 1 as inc
from t
) union all
(select expiry_date as dte, -1 as inc
from t
)
) t
group by dte
order by dte;
EDIT:
Hmmm, the above may not do exactly what you want. If you want to count distinct client numbers rather than overall rows, then it might be simpler to just list the dates and join:
select d.dte, count(distinct t.client_id)
from (select date '2020-01-01' as dte from dual union all
select date '2019-01-01' as dte from dual union all
select date '2018-01-01' as dte from dual union all
. . .
) d left join
t
on d.dte between t.issue_dte and t.expiry_dte
group by d.dte
order by d.dte;

Create dynamic SQL statement for one year

I would like to compare the number of newly made subscriptions with the number of ending subscriptions per month in 2018 and combine that into one table.
With x-axis being the months of 2018 (so January, February, and so on) and y-axis as first row my first sql query = the number of ran out subscriptions in that month. The second row would be my second sql query = the number of newly made subscriptions in that month. My "queries" down there are as an example for March 2018.
SELECT
COUNT (UserId)
FROM
UserInAppPurchase
WHERE
ValidTo > '2018-03-01' AND ValidTo < '2018-03-31'
GROUP BY
UserId
SELECT COUNT(UserId)
FROM UserInAppPurchase
WHERE PurchaseDate > '2018-03-01'
AND PurchaseDate < '2018-03-31'
GROUP BY UserId
Thanks very much for the help

If I understood it right, the above query will return multiple rows for a single month which will depend to the total number of unique users placing order. Then for each month and each user being two verticals, the said graph will be a 3D-graph which I don't think the case here is.
So Instead, I could think of 2 cases,
You want total number of subscriptions expired per month of each month
SELECT MONTH(ValidTo) ExpireMonth, COUNT(UserId) ExpireCount
FROM UserInAppPurchase
WHERE YEAR(ValidTo) = 2018
GROUP BY MONTH(ValidTo);
You want to know the number of users whose at least one subscription have expired
WITH UniqueUsers AS
(
SELECT DISTINCT MONTH(ValidTo) ExpireMonth, UserId
FROM UserInAppPurchase
WHERE YEAR(ValidTo) = 2018
)
SELECT ExpireMonth, COUNT(UserId) UserCount
FROM UniqueUsers
GROUP BY ExpireMonth;
The similar query will be for purchased subscriptions as well.
Please let me know if what you require is different.

if you want the count of user id you should not group by for user id
SELECT COUNT (UserId), 'valid_to'
FROM UserInAppPurchase WHERE ValidTo>'2018-03-01' and ValidTo<'2018-03-31'
union all
SELECT COUNT (UserId), 'PurchaseDate'
FROM UserInAppPurchase WHERE PurchaseDate>'2018-03-01' and PurchaseDate<'2018-03-31'

How can I adjust this query to produce a result that shows the average on a month-by-month basis over time

I'm having a hard time producing the desired result with one of my queries.
I'd like to be able to display the average revenue generated per user on a rolling month by month basis, based on the following criteria:
User must belong to a particular cohort, defined as a user who has booked more than 20 times in the last 90 days (so, for example, a user only gets counted in the January cohort if they have booked more than 20 times across the months of November, December and January)
The below query is what I have now, which pulls the average revenue per user for the January cohort:
WITH bookings as (SELECT u.id as user_id, count(*) as bookings_last_90, sum(total)/100 as revenue_last_90
FROM revenue r
JOIN users u on r.user_id = u.id
WHERE (CAST(r.created_at AS date) BETWEEN CAST((NOW() + INTERVAL '-90 day') AS date)
AND CAST(now() AS date))
GROUP BY u.id
HAVING COUNT(*) >= 20)
SELECT avg(b.revenue_last_90)
FROM bookings b;
I essentially need to adapt the above query to pull the average revenue per cohort user on a rolling month by month basis, keeping in tact the past 90-day timeframe for cohort definition.

The general approach when you've got a query that works with one timestamp is:
Generate a list of dates or timestamps to use in a table, view, CTE, etc
Join to the list of timestamps
Replace the timestamp you're using with the timestamp from the list
With no schema, I can't test it, but the results may look something like:
WITH --first generate list of dates from the created_at field in revenue
month_list as (select date_trunc('month' , r.created_at) as m from revenue r group by 1 )
--then use that in the bookings query
, bookings as (SELECT u.id as user_id, m.m as cohort_month, count(*) as bookings_last_90, sum(total)/100 as revenue_last_90
FROM revenue r
JOIN users u on r.user_id = u.id
join month_list m on r.created_at between m.m + interval'-60 day' and m.m + interval'1 month'
WHERE true
GROUP BY u.id , m.m
HAVING COUNT(*) >= 20)
--finally, use the date in the result query
SELECT avg(b.revenue_last_90), cohort_month
FROM bookings b group by cohort_month;

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas