SQL max n versions for each week of each year - sql

I have a data table with [YEAR], [WEEKNO], [VERSIONNO] cols (and others).
I want to output the whole row (each column) of the latest n number of VERSIONNO's for each WEEKNO in each YEAR.
What's the best way to do it in SQL?

You would use row_number():
select t.*
from (select t.*,
row_number() over (partition by year, weekno order by versionno desc) as seqnum
from t
) t
where seqnum <= n -- Your value goes here
row_number() is ANSI-standard functionality supported by most databases.

Related

Getting 3 best Posts per Month with three different queries

I am having a hard time wrapping my head around the row_number function.
This is my SCHEMA :
I am trying to build a query that would output the top value for Post_impressions within a date range (I.E. a month) WHEN the RowNumber is set to 1, the second best value when it is set to 2 and so on.
Here is the query I came up with so far
SELECT Post_timestamp,
Post_impressions,
Post_tipo,
from
(SELECT Post_timestamp,
Post_impressions,
Post_tipo,
FORMAT_DATE("%Y-%m-%d",DATE_TRUNC(TIMESTAMP(Post_timestamp), DAY)) as TheDate,
row_number() OVER
(PARTITION BY FORMAT_DATE("%Y-%m-%d",DATE_TRUNC(TIMESTAMP(Post_timestamp), DAY)) ORDER BY Post_impressions DESC) AS RowNumber
from `***DATABASENAME***`
WHERE RowNumber = 1 AND TheDate BETWEEN "2021-07-01" AND "2021-07-31";
Thans for your help!
You're getting 31 rows because you're partitioning the subquery by day, and each partition has a RowNumber = 1. You could partition your query by month but I suspect that wouldn't address all your use cases, particularly when you want to look at a time period over multiple partitions.
Alternatively if your use case is limited to month over month, you can simply partition by the month.
SELECT Post_timestamp,
Post_impressions,
Post_tipo,
from
(SELECT Post_timestamp,
Post_impressions,
Post_tipo,
FORMAT_DATE("%Y-%m-%d",DATE_TRUNC(TIMESTAMP(Post_timestamp), day)) as TheDate,
row_number() OVER
(PARTITION BY FORMAT_DATE("%Y-%m-%d",DATE_TRUNC(TIMESTAMP(Post_timestamp), month)) ORDER BY Post_impressions DESC) AS RowNumber
from `***DATABASENAME***`
WHERE RowNumber = 1 AND TheDate BETWEEN "2021-07-01" AND "2021-07-31";

ORACLE SQL: Find last minimum and maximum consecutive period

I have the sample data set below which list the water meters not working for specific reason for a certain range period (jan 2016 to december 2018).
I would like to have a query that retrieves the last maximum and minimum consecutive period where the meter was not working within that range of period.
any help will be greatly appreciated.
You have two options:
select code, to_char(min_period, 'yyyymm') min_period, to_char(max_period, 'yyyymm') max_period
from (
select code, min(period) min_period, max(period) max_period,
max(min(period)) over (partition by code) max_min_period
from (
select code, period, sum(flag) over (partition by code order by period) grp
from (
select code, period,
case when add_months(period, -1)
= lag(period) over (partition by code order by period)
then 0 else 1 end flag
from (select mrdg_acc_code code, to_date(mrdg_per_period, 'yyyymm') period from t)))
group by code, grp)
where min_period = max_min_period
Explanation:
flag rows where period is not equal previous period plus one month,
create column grp which sums flags consecutively,
group data using code and grp additionaly finding maximal start of period,
show only rows where min_period = max_min_period
Second option is recursive CTE available in Oracle 11g and above:
with
data(period, code) as (
select to_date(mrdg_per_period, 'yyyymm'), mrdg_acc_code from t
where mrdg_per_period between 201601 and 201812),
cte (period, code) as (
select to_char(period, 'yyyymm'), code from data
where (period, code) in (select max(period), code from data group by code)
union all
select to_char(data.period, 'yyyymm'), cte.code
from cte
join data on data.code = cte.code
and data.period = add_months(to_date(cte.period, 'yyyymm'), -1))
select code, min(period) min_period, max(period) max_period
from cte group by code
Explanation:
subquery data filters only rows from 2016 - 2018 additionaly converting period to date format. We need this for function add_months to work.
cte is recursive. Anchor finds starting rows, these with maximum period for each code. After union all is recursive member, which looks for the row one month older than current. If it finds it then net row, if not then stop.
final select groups data. Notice that period which were not consecutive were rejected by cte.
Though recursive queries are slower than traditional ones, there can be scenarios where second solution is better.
Here is the dbfiddle demo for both queries. Good luck.
use aggregate function with group by
select max(mdrg_per_period) mdrg_per_period, mrdg_acc_code,max(mrdg_date_read),rea_Desc,min(mdrg_per_period) not_working_as_from
from tablename
group by mrdg_acc_code,rea_Desc
This is a bit tricky. This is a gap-and-islands problem. To get all continuous periods, it will help if you have an enumeration of months. So, convert the period to a number of months and then subtract a sequence generated using row_number(). The difference is constant for a group of adjacent months.
This looks like:
select acc_code, min(period), max(period)
from (select t.*,
row_number() over (partition by acc_code order by period_num) as seqnum
from (select t.*, floor(period / 100) * 12 + mod(period, 100) as period_num
from t
) t
where rea_desc = 'METER NOT WORKING'
) t
group by (period_num - seqnum);
Then, if you want the last one for each account, you can use a subquery:
select t.*
from (select acc_code, min(period), max(period),
row_number() over (partition by acc_code order by max(period desc) as seqnum
from (select t.*,
row_number() over (partition by acc_code order by period_num) as seqnum
from (select t.*, floor(period / 100) * 12 + mod(period, 100) as period_num
from t
) t
where rea_desc = 'METER NOT WORKING'
) t
group by (period_num - seqnum)
) t
where seqnum = 1;

Running count distinct

I am trying to see how the cumulative number of subscribers changed over time based on unique email addresses and date they were created. Below is an example of a table I am working with.
I am trying to turn it into the table below. Email 1#gmail.com was created twice and I would like to count it once. I cannot figure out how to generate the Running count distinct column.
Thanks for the help.
I would usually do this using row_number():
select date, count(*),
sum(count(*)) over (order by date),
sum(sum(case when seqnum = 1 then 1 else 0 end)) over (order by date)
from (select t.*,
row_number() over (partition by email order by date) as seqnum
from t
) t
group by date
order by date;
This is similar to the version using lag(). However, I get nervous using lag if the same email appears multiple times on the same date.
Getting the total count and cumulative count is straight forward. To get the cumulative distinct count, use lag to check if the email had a row with a previous date, and set the flag to 0 so it would be ignored during a running sum.
select distinct dt
,count(*) over(partition by dt) as day_total
,count(*) over(order by dt) as cumsum
,sum(flag) over(order by dt) as cumdist
from (select t.*
,case when lag(dt) over(partition by email order by dt) is not null then 0 else 1 end as flag
from tbl t
) t
DEMO HERE
Here is a solution that does not uses sum over, neither lag... And does produces the correct results.
Hence it could appear as simpler to read and to maintain.
select
t1.date_created,
(select count(*) from my_table where date_created = t1.date_created) emails_created,
(select count(*) from my_table where date_created <= t1.date_created) cumulative_sum,
(select count( distinct email) from my_table where date_created <= t1.date_created) running_count_distinct
from
(select distinct date_created from my_table) t1
order by 1

Running total in per year ordered by person based on latest date info

We try to calculate the running total in for each year ordered by person based on his latest date info. So i got an example for you how the data is ordered:
Expected result:
So for each downloaded date we want to running total in of all persons ordered by year (now the year is only 2018)
What do we have so far:
sum(Amount)
over(partition by [Year],[Person]
order by [Enddate)
where max(Downloaded)
Any idea how to fix this?
Just use window function
select *,
sum(Amount) over (partition by Year, Downloaded) RuningTotal
from table t
Try using a subquery with a moving downloaded date range.
SELECT
T.*,
RunningTotalByDate = (
SELECT
SUM(N.Amount)
FROM
YourTable AS N
WHERE
N.Downloaded <= T.Downloaded)
FROM
YourTable AS T
ORDER BY
T.Downloaded ASC,
T.Person ASC
Or with windowed SUM(). Do no include a PARTITION BY because it will reset the sum when the partitioned by column value changes.
SELECT
T.*,
RunningTotalByDate = SUM(T.Amount) OVER (ORDER BY T.Downloaded ASC)
FROM
YourTable AS T
ORDER BY
T.Downloaded ASC,
T.Person ASC

Tagging consecutive days

Supposedly I have data something like this:
ID,DATE
101,01jan2014
101,02jan2014
101,03jan2014
101,07jan2014
101,08jan2014
101,10jan2014
101,12jan2014
101,13jan2014
102,08jan2014
102,09jan2014
102,10jan2014
102,15jan2014
How could I efficiently code this in Greenplum SQL such that I can have a grouping of consecutive days similar to the one below:
ID,DATE,PERIOD
101,01jan2014,1
101,02jan2014,1
101,03jan2014,1
101,07jan2014,2
101,08jan2014,2
101,10jan2014,3
101,12jan2014,4
101,13jan2014,4
102,08jan2014,1
102,09jan2014,1
102,10jan2014,1
102,15jan2014,2
You can do this using row_number(). For a consecutive group, the difference between the date and the row_number() is a constant. Then, use dense_rank() to assign the period:
select id, date,
dense_rank() over (partition by id order by grp) as period
from (select t.*,
date - row_number() over (partition by id order by date) * 'interval 1 day'
from table t
) t