Group Data by Year, Oracle SQL - sql

I am trying to create a query that counts records that existed within a year. The table looks like this:
Title_ID ISSUE_DATE EXPIRY_DATE CLIENT_NUMBER
123 '26-JUN-19' '17-AUG-20' 8529
124 '04-APR-19' '17-SEP-22' 8529
125 '09-MAY-15' '11-SEP-19' 3654
126 '31-DEC-19' '25-NOV-22' 9852
127 '27-OCT-18' '26-FEB-21' 2254
128 '05-OCT-11' '01-JAN-19' 9852
Specifically, I want to count the number of distinct CLIENT_NUMBERS of the records that existed in a given calendar year.
The record (title) exists from the ISSUE_DATE until the EXPIRY_DATE. If the record existed at any point within a year (Let's say 2019), then we are interested in including it in our client count.
So, if the record was issued in 2019 or if the record expired in 2019 or if the record was issued before 2019 and expired after 2019, then we are interested in including it in the client count for the year it existed.
I have built the following query that does this, but only for one specific year (2019). I'd like to build the query further so it look at each calendar year and counts the distinct client numbers when the client has an active title:
SELECT *
-- count(distinct client_number)
FROM
TITLE
WHERE
issue_date between '01-Jan-19' and '31-Dec-19'
or expiry_date between '01-Jan-19' and '31-Dec-19'
or (issue_date < '01-Jan-19' and expiry_date > '31-Dec-19')
Where I am having trouble is, my data is much larger than the subset I have provided. I would like to recursively get counts of distinct client numbers by year using the same kind of logic to include a record within a calendar year as I have outlined above. So, I'd like to have a table like this:
YEAR COUNT_OF_CLIENT_NUMBERS
2020 5469
2019 5587
2018 4852
2017 4501
2016 3265
etc
I think I've stretched by current SQL abilities at this point, so I thought Id ask to see if there are any suggestions to make this happen?
Thanks.
EDIT: to clarify, the issue date and the expiry date apply to the title, not the client. So, the title is issued on the issue date and expires on the expiry date. A client can own one or more title(s).
So, I am looking to get a count of how many distinct clients own active titles within a give year if one or more of their titles is active within that year. So the key is, a title is considered active if it was issued in that year OR it expired within that year OR it was issued before that year and expired after that year. A title CAN be active in multiple years (i.e. Issued on Feb. 4, 2014 and expires on Apr.7 2017, I want to include the client count for each year that titles exists....2014, 2015, 2016 and 2017).
So, I created a table to join to (thanks #GMB for the suggestion):
with calendar_year (y) as
(
select 2010 from dual
union all select y + 1 from calendar_year where y < 2020
)
select * from calendar_year
Which returns:
2010
2011
2012
2013
2014
etc
I want to join that to my titles table, but I am having issues recursively looking at the issue date and expiry date to join up the title to each year it existed in. Any help in that area, would be great!

You can use a recursive query to generate the years, then bring the table with a left join, and aggregate:
with dates (dt) as (
select date '2016-01-01' from dual
union all select add_months(dt, 1) from dates where dt < date '2020-01-01'
)
select d.dt, count(distinct t.client_number) count_of_client_numbers
from dates d
left join title t
on t.issue_date <= d.dt
and t.expiry_date > d.dt
group by d.dt
The upside of this approach is that you get results for each and every year, even those where no title started or ended.

You can get number of clients on any day by unpivoting the data, so there is one row per date. Then keep track of the "ins" and "outs".
You don't specify the database, but here is one approach:
select dte, sum(inc),
sum(sum(inc)) over (order by dte) as active_on_date
from ((select issue_date as dte, 1 as inc
from t
) union all
(select expiry_date as dte, -1 as inc
from t
)
) t
group by dte
order by dte;
EDIT:
Hmmm, the above may not do exactly what you want. If you want to count distinct client numbers rather than overall rows, then it might be simpler to just list the dates and join:
select d.dte, count(distinct t.client_id)
from (select date '2020-01-01' as dte from dual union all
select date '2019-01-01' as dte from dual union all
select date '2018-01-01' as dte from dual union all
. . .
) d left join
t
on d.dte between t.issue_dte and t.expiry_dte
group by d.dte
order by d.dte;

Related

Data value on a given date

This time I have a table on a PostgreSQL database that contains the employee name, the date that he started working and the date that he leaves the company, in the cases of the employee still remains in the company, this field has null value.
Knowing this, I would like to know how many people was working on a predetermined date, ex:
I would like to know how many people works on the company in January 2021.
I don't know where to start, in some attempts I got the number of hires and layoffs per month, but I need to show this accumulated value per month, in another column.
I hope I made myself understood, I'll leave the last SQL I got here.
select reference, sum(hires) from
(
select
date_trunc('month', date_hires) as reference,
count(*) as hires
from
ponto_mais_relatorio_colaboradores
group by
date_hires
union all
select
date_trunc('month', date_layoff) as reference,
count(*)*-1 as layoffs
from
ponto_mais_relatorio_colaboradores
group by
date_layoff
) as reference
join calendar_aux on calendar_aux.ano_mes = reference
group by reference
order by reference
Break the requirement down. The question: how many are employed on any given date? That would include all hired before that date and do not have a layoff date plus all hired before with a layoff date later then the date your interested period. I.e you are interested in Jan so you still want to count an employee with a layoff date in Feb. With that in place convert into SQL. The preceding is available from select comparing dates. other issue is that Jan is not a date, it is a range of dates, so you need each date. You can use generate series to create each day in Jan. Then Join the generated dates with and selection from your table. Resulting query:
with jan_dates( jdate ) as
( select generate_series( date '2021-01-01'
, date '2021-01-31'
, interval '1' day
)::date
)
select jdate "Date", count(*) "Employees"
from jan_dates j
join employees e
on ( e.date_hires <= j.jdate
and ( e.date_layoff is null
or e.date_layoff > j.jdate
)
)
group by j.jdate
order by j.jdate;
Note: Not tested.

Count Records Prior to Date for Whole Year

I have a historical database with about 9000 records with unique UserID and date they created an account CreatedDate that looks like this:
UserID CreatedDate
1 5/12/2019
2 1/1/2018
3 4/2/2015
4 8/9/2016
. ..
I would like to know how many accounts were created UP TO a certain date, but for multiple months.
For example, how many accounts were there in Jan 2020, Feb 2020, Mar 2020, so on and so forth.
The manual way would be to do this for each month but it would be tedious:
select count(*)
from SCHEMA
--KEEP REPLACING THE MONTH TO GET COUNTS
where CreatedDate <= '2020-01-31'
Just wondering if there is a more efficient way? A group by wouldn't work because it just totals for each month, but I'm trying to get a historical count. Thanks!
You seem to need running total for each month. If so, you need group by to compute total counts per month and then you have to sum them using analytical sum function.
This is how you would do it in Postgres (db fiddle). Other vendors may differ in the way how month is extracted but the principle is same.
with schema(UserID, CreatedDate) as (values
(1, date '2019-12-05'),
(2, date '2018-01-01'),
(3, date '2015-01-04'),
(4, date '2016-09-08')
)
select month, sum(cnt) over (order by month) from (
select date_trunc('month', CreatedDate)::date as month, count(*) as cnt
from schema
group by date_trunc('month', CreatedDate)::date
) x
Note if data has gaps in month sequence and you want continuous sequence (for example all months between 2015-01 and 2019-12), you have to pregenerate calendar (relation with all months) and left join table schema to it. (It is not in my example yet because of YAGNI.)

Rolling 12 month filter criteria in SQL

Having an issue in SQL script where I’m trying to achieve filter criteria of rolling 12 months in the day column which stored data as a text in server.
Goal is to count sizes for product at retail store location over the last 12 months from the current day. Currently, in my query I'm using the criteria of year 2019 which only counts the sizes for that year but not for rolling 12 months from current date.
CALENDARDAY column is in text field in the data set and data stores in yyyymmdd format.
When trying to run below script in Tableau with GETDATE and DATEADD function it is giving me a functional error. I am trying to access SAP HANA server with below query.
Any help would be appreciated
Select
SKU, STYLE_ID, Base_Style_ID, COLOR, SIZEKEY, STORE, Year,
count(SIZEKEY)over(partition by STYLE_ID,COLOR,STORE,Year) as SZ_CNT
from
(
select
a."RAW" As SKU,
a."STYLENUM" As STYLE_ID,
mat."BASENUM" AS Base_Style_ID,
a."COLORNUM" AS COLOR,
a."SIZE" AS SIZEKEY,
a."STORENUM" AS STORE,
substring(a."CALENDARDAY",1,4) As year
from PRTRPT_XRE as a
JOIN ZAT_SKU As mat On a."RAW" = mat."SKU"
where a."ORGANIZATION" = 'M20'
and a."COLORNUM" is not null
and substring(a."CALENDARDAY",1,4) = '2019'
Group BY
a."RAW",
a."STYLENUM",
mat."BASENUM",
a."ZCOLORCD",
a."SIZE",
a."STORENUM",
substring(a."CALENDARDAY",1,4)
)
I have never worked on that DB / Server, so I don't have a way to test this.
But hopefully this will work (expecting exact 12 months before today's date)
AND ADD_MONTHS (TO_DATE (a."CALENDARDAY", 'YYYY-MM-DD'), 12) > CURRENT_DATE
or
AND ADD_MONTHS (a."CALENDARDAY", 12) > CURRENT_DATE
Below condition from one of our CALENDAR table also worked same way as ADD_MONTHS mentioned in above response
select distinct CALENDARDAY
from
(
select FISCALWEEK, CALENDARDAY, CNST, row_number()over(partition by CNST order by FISCALWEEK desc) as rnum
from
(
select distinct FISCALWEEK, CALENDARDAY, 'A' as CNST
from CALENDARTABLE
where CALENDARDAY < current_date
order by 1,2
)
) where rnum < 366

I want find customers transacting for any consecutive 3 months from year 2017 to 2018

I want to know the trick to find the list of customers who are transacting for consecutive 3 months ,that could be any 3 consecutive months with any number of occurrence.
example: suppose there is customer who transact in January then keep transacting till march then he stopped transacting.I want the list of these customer from my database .
I am working on AWS Athena.
One method uses aggregation and window functions:
select customer_id, yyyymm_2
from (select date_trunc(month, transactdate) as yyyymm, customer_id,
lag(date_trunc(month, transactdate), 2) over (partition by customer_id order by date_trunc(month, transactdate)) as prev_yyyymm_2
from t
where transactdate >= '2017-01-01' and
transactadte < '2019-01-01'
)
where prev_dt_2 = yyyymm - interval '2' month;
This aggregates transactions by month and looks at the transaction date two rows earlier. The outer filter checks that that date is exactly 2 months earlier.

SQL window-over by-incremental distinct users

I am sure this must be fairly easy for you but unfortunately it is not for me !
I am trying to write a query that counts incremental distinct user id grouped by month.
Understand if user X has a row in both january and february he should be counted as 1 in January but not in February.
I can do the following below for a given month but I would like to automate it
EDIT :
Let me try to clarify: a row in table UX is created every time a user performs a given action. I would like to count the number of unique NEW(/incremental) users every month who performed this action. Meaning if user A performed this action in January AND February he would only be counted in January.
select
count(distinct ux.account_id)
, trunc(ux.date_key,'MM') as month
from
ux
left join
(
select
distinct ux.account_id as account_id
from
ux
where
and ux.date_key < '2019-02-01'
) bf on ux.account_id=bf.account_id
where
and ux.date_key >= '2019-02-01'
and bf.account_id IS NULL
group by
trunc(ux.date_key,'MM')
"incremental distinct user" to me sounds a lot like a user starting. Does this do what you want?
select trunc(min_date_key, 'MM') as month
from (select account_id, min(ux.date_key) as min_date_key
from ux
group by account_id
) ux
where min_date_key < '2019-02-01' and ux.account_id IS NULL
group by trunc(ux.date_key, 'MM')