MariaDB get first and last record of the month - nested query - sql

I am using MariaDB and I have these kind of data:
I have also data for March and I am using this query to select distinct Months from the database:
select distinct(DATE_FORMAT(DT,'%m-%Y')) AS singleMonth FROM myTable
I want to be able to select FIRST and LAST record of P2 column for every month. How it is possible using the query above for getting all distinct months and also getting first record for the month and last?
Example what the query should return look-like:

You can use window functions and conditional aggregation:
select year(dt), month(dt),
min(case when seqnumn_asc = 1 then p2 end) as first_p2,
min(case when seqnumn_desc = 1 then p2 end) as last_p2
from (select t.*,
row_number() over (partition by year(dt), month(dt) order by dt asc) as seqnum_asc,
row_number() over (partition by year(dt), month(dt) order by dt desc) as seqnum_desc
from t
) t
group by year(dt), month(dt);

Related

How to get increment number when there are any change in a column in Bigquery?

I have data date, id, and flag on this table. How I can get the value column where this column is incremental number and reset from 1 when there are any change in flag column?
Consider below approach
select * except(changed, grp),
row_number() over(partition by id, grp order by date) value
from (
select *, countif(changed) over(partition by id order by date) grp
from (
select *,
ifnull(flag != lag(flag) over(partition by id order by date), true) changed
from `project.dataset.table`
))
if applied to sample data in your question - output is
You seem to want to count the number of falses since the last true. You can use:
select t.* except (grp),
(case when flag
then 1
else row_number() over (partition by id, grp order by date) - 1
end)
from (select t.*,
countif(flag) over (partition by id order by date) as grp
from t
) t;
If you know that the dates have no gaps, you can actually do this without a subquery:
select t.*,
(case when flag then 1
else date_diff(date,
max(case when flag then date end) over (partition by id),
day)
end)
from t;

SQL - get counts based on rolling window per unique id

I'm working with a table that has an id and date column. For each id, there's a 90-day window where multiple transactions can be made. The 90-day window starts when the first transaction is made and the clock resets once the 90 days are over. When the new 90-day window begins triggered by a new transaction I want to start the count from the beginning at one. I would like to generate something like this with the two additional columns (window and count) in SQL:
id date window count
name1 7/7/2019 first 1
name1 12/31/2019 second 1
name1 1/23/2020 second 2
name1 1/23/2020 second 3
name1 2/12/2020 second 4
name1 4/1/2020 third 1
name2 6/30/2019 first 1
name2 8/14/2019 first 2
I think getting the rank of the window can be done with a CASE statement and MIN(date) OVER (PARTITION BY id). This is what I have in mind for that:
CASE WHEN MIN(date) OVER (PARTITION BY id) THEN 'first'
WHEN DATEDIFF(day, date, MIN(date) OVER (PARTITION BY id)) <= 90 THEN 'first'
WHEN DATEDIFF(day, date, MIN(date) OVER (PARTITION BY id)) > 90 AND DATEDIFF(day, date, MIN(date) OVER (PARTITION BY id)) <= 180 THEN 'third'
WHEN DATEDIFF(day, date, MIN(date) OVER (PARTITION BY id)) > 180 AND DATEDIFF(day, date, MIN(date) OVER (PARTITION BY id)) <= 270 THEN 'fourth'
ELSE NULL END
And incrementing the counts within the windows would be ROW_NUMBER() OVER (PARTITION BY id, window)?
You cannot solve this problem with window functions only. You need to iterate through the dataset, which can be done with a recursive query:
with
tab as (
select t.*, row_number() over(partition by id order by date) rn
from mytable t
)
cte as (
select id, date, rn, date date0 from tab where rn = 1
union all
select t.id, t.date, t.rn, greatest(t.date, c.date + interval '90' day)
from cte c
inner join tab t on t.id = c.id and t.rn = c.rn + 1
)
select
id,
date,
dense_rank() over(partition by id order by date0) grp,
count(*) over(partition by id order by date0, date) cnt
from cte
The first query in the with clause ranks records having the same id by increasing date; then, the recursive query traverses the data set and computes the starting date of each group. The last step is numbering the groups and computing the window count.
GMB is totally correct that a recursive CTE is needed. I offer this as an alternative form for two reasons. First, because it uses SQL Server syntax, which appears to be the database being used in the question. Second, because it directly calculates window and count without window functions:
with t as (
select t.*, row_number() over (partition by id order by date) as seqnum
from tbl t
),
cte as (
select t.id, t.date, dateadd(day, 90, t.date) as window_end, 1 as window, 1 as count, seqnum
from t
where seqnum = 1
union all
select t.id, t.date,
(case when t.date > cte.window_end then dateadd(day, 90, t.date)
else cte.window_end
end) as window_end,
(case when t.date > cte.window_end then window + 1 else window end) as window,
(case when t.date > cte.window_end then 1 else cte.count + 1 end) as count,
t.seqnum
from cte join
t
on t.id = cte.id and
t.seqnum = cte.seqnum + 1
)
select id, date, window, count
from cte
order by 1, 2;
Here is a db<>fiddle.

Determine MIN Date from Consecutive Occurrences

I have a table that contains the following columns: Date, Customer, Active Flag. I need to add a fourth column called Start. The Start column should return the first date the client was active, based on consecutive active flags.
shows the three columns I currently have and the results I wish to return for the Start column.
Your insight into what my SQL code should look like to achieve this would be appreciated. Thanks!!
You can do this without subqueries, if I assume one date per month per customer:
select t.*,
(case when activeflag = 1
then coalesce(max(case when activeflag = 0 then date end) over (partition by customer order by date) + interval '1 month',
min(case when activeflag = 1 then date end) over (partition by customer)
)
end) as start
from t;
Subqueries, though, might make this easier. You can treat this as a gaps-and-islands problem:
select t.*,
(case when activeflag = 1
then min(date) over (partition by customerid, seqnum - seqnum_a)
end) as start
from (select t.*,
row_number() over (partition by customerid order by date) as seqnum,
row_number() over (partition by customerid, activeflag order by date) as seqnum_a
from t
) t

Running count distinct

I am trying to see how the cumulative number of subscribers changed over time based on unique email addresses and date they were created. Below is an example of a table I am working with.
I am trying to turn it into the table below. Email 1#gmail.com was created twice and I would like to count it once. I cannot figure out how to generate the Running count distinct column.
Thanks for the help.
I would usually do this using row_number():
select date, count(*),
sum(count(*)) over (order by date),
sum(sum(case when seqnum = 1 then 1 else 0 end)) over (order by date)
from (select t.*,
row_number() over (partition by email order by date) as seqnum
from t
) t
group by date
order by date;
This is similar to the version using lag(). However, I get nervous using lag if the same email appears multiple times on the same date.
Getting the total count and cumulative count is straight forward. To get the cumulative distinct count, use lag to check if the email had a row with a previous date, and set the flag to 0 so it would be ignored during a running sum.
select distinct dt
,count(*) over(partition by dt) as day_total
,count(*) over(order by dt) as cumsum
,sum(flag) over(order by dt) as cumdist
from (select t.*
,case when lag(dt) over(partition by email order by dt) is not null then 0 else 1 end as flag
from tbl t
) t
DEMO HERE
Here is a solution that does not uses sum over, neither lag... And does produces the correct results.
Hence it could appear as simpler to read and to maintain.
select
t1.date_created,
(select count(*) from my_table where date_created = t1.date_created) emails_created,
(select count(*) from my_table where date_created <= t1.date_created) cumulative_sum,
(select count( distinct email) from my_table where date_created <= t1.date_created) running_count_distinct
from
(select distinct date_created from my_table) t1
order by 1

To find the last updated record of each month for each policy(another field)

I have a table named a, and other fields as eff_date,policy no.
Now for each policy, consider all the records, and take out the last updated one (eff_date) from each month.
So I need the last updated record for each month for each policy. How would I write a query for this?
I'm not 100 percent on Teradata syntax, but I believe you're after this:
SELECT policy_no,eff_date
FROM (SELECT policy_no,eff_date, ROW_NUMBER() OVER (PARTITION BY policy no, EXTRACT(YEAR FROM eff_date),EXTRACT(MONTH FROM eff_date) ORDER BY eff_date DESC) as RowRank
FROM a) as sub
WHERE RowRank = 1
I'm assuming when you say by month you also want to differentiate by year, but if not, just remove the EXTRACT(YEAR FROM eff_date) from the PARTITION BY section.
Edit: Update for Teradata syntax.
SELECT * from a
qualify ROW_NUMBER() OVER (PARTITION BY policy no, EXTRACT(YEAR FROM eff_date),
EXTRACT(MONTH FROM eff_date) ORDER BY eff_date DESC) = 1
The main difficulty, is that the group by needs to be made both the conbination of policy_no, but also the month (extracted from the date). For example:
In Mysql
SELECT policy_no,
month(eff_date),
year(eff_date),
max(eff_date)
FROM myTable
GROUP BY policy_no,
month(eff_date),
year(eff_date);
Update
I saw derived tables are allowed in teradata. Using a join to a derived table, here is how to access the full rows:
select * from a,
(SELECT policy_no,
month(eff_date),
year(eff_date),
max(eff_date) as MaxMonthDate
FROM a
GROUP BY policy_no,
month(eff_date),
year(eff_date)
) as b
where a.policy_no = b.policy_no and
a.eff_date = b.MaxMonthDate;
http://www.sqlfiddle.com/#!2/1f728/5
Update (Using Extract)
select * from a,
(SELECT a2.policy_no,
EXTRACT(MONTH FROM a2.eff_date),
EXTRACT(YEAR FROM a2.eff_date),
max(a2.eff_date) as MaxMonthDate
FROM a as a2
GROUP BY a2.policy_no,
EXTRACT(MONTH FROM a2.eff_date),
EXTRACT(YEAR FROM a2.eff_date)
) as b
where a.policy_no = b.policy_no and
a.eff_date = b.MaxMonthDate;
I'm going to suggest looking into Windows Aggregate functions and the QUALIFY statement. I believe the following SQL will work.
SELECT Policy_No
, EXTRACT(MONTH FROM Eff_Date) AS Eff_Month_
, Eff_Date
FROM TableA
QUALIFY ROW_NUMBER() OVER (PARTITION BY Policy_No, EXTRACT(MONTH FROM Eff_Date)
ORDER BY Eff_Date DESC) = 1;