How would I get the last N quarters? - sql

How would I get the last N quarters? I would like to extract the data that contains the last 5 quarters (including the current quarter).
Below is a SQL is just grouping the milestone to show how many unique data points there are. In these milestones, they contain multiple rows/data.
SELECT LEFT(MILESTONE,7) AS MILESTONE2
FROM XXXTable
WHERE MILESTONE LIKE '%M0'
GROUP BY 1
ORDER BY MILESTONE2 DESC
MILESTONE2
2020_Q4
2020_Q3
2020_Q2
2020_Q1
2019_Q4
2019_Q3
2019_Q2
2019_Q1
2018_Q4
2018_Q3

You can use dense_rank():
select t.*
from (select t.*, dense_rank() over (order by LEFT(MILESTONE, 7) desc) as seqnum
from XXXTable
where MILESTONE like '%M0'
) t
where seqnum <= 5

Related

Complex Ranking in SQL (Teradata)

I have a peculiar problem at hand. I need to rank in the following manner:
Each ID gets a new rank.
rank #1 is assigned to the ID with the lowest date. However, the subsequent dates for that particular ID can be higher but they will get the incremental rank w.r.t other IDs.
(E.g. ADF32 series will be considered to be ranked first as it had the lowest date, although it ends with dates 09-Nov, and RT659 starts with 13-Aug it will be ranked subsequently)
For a particular ID, if the days are consecutive then ranks are same, else they add by 1.
For a particular ID, ranks are given in date ASC.
How to formulate a query?
You need two steps:
select
id_col
,dt_col
,dense_rank()
over (order by min_dt, id_col, dt_col - rnk) as part_col
from
(
select
id_col
,dt_col
,min(dt_col)
over (partition by id_col) as min_dt
,rank()
over (partition by id_col
order by dt_col) as rnk
from tab
) as dt
dt_col - rnk caluclates the same result for consecutives dates -> same rank
Try datediff on lead/lag and then perform partitioned ranking
select t.ID_COL,t.dt_col,
rank() over(partition by t.ID_COL, t.date_diff order by t.dt_col desc) as rankk
from ( SELECT ID_COL,dt_col,
DATEDIFF(day, Lag(dt_col, 1) OVER(ORDER BY dt_col),dt_col) as date_diff FROM table1 ) t
One way to think about this problem is "when to add 1 to the rank". Well, that occurs when the previous value on a row with the same id_col differs by more than one day. Or when the row is the earliest day for an id.
This turns the problem into a cumulative sum:
select t.*,
sum(case when prev_dt_col = dt_col - 1 then 0 else 1
end) over
(order by min_dt_col, id_col, dt_col) as ranking
from (select t.*,
lag(dt_col) over (partition by id_col order by dt_col) as prev_dt_col,
min(dt_col) over (partition by id_col) as min_dt_col
from t
) t;

Spark SQL - Finding the maximum value of a month per year

I have created a data frame which contains Year, Month, and the occurrence of incidents (count).
I want to find the month of each year had the most incident using spark SQL.
You can use window functions:
select *
from (select t.*, rank() over(partition by year order by cnt desc) rn from mytable t) t
where rn = 1
For each year, this gives you the row that has the greatest cnt. If there are ties, the query returns them.
Note that count is a language keyword in SQL, hence not a good choice for a column name. I renamed it to cnt in the query.
You can use window functions, if you want to use SQL:
select t.*
from (select t.*,
row_number() over (partition by year order by count desc) as seqnum
from t
) t
where seqnum = 1;
This returns one row per year, even if there are ties for the maximum count. If you want all such rows in the event of ties, then use rank() instead of row_number().

Presto SQL - Rank Multiple Conditions for Multiple Columns

I am trying to write a single query (if possible) to rank ids based on multiple conditions.
My table is like this:
id group subgroup value
1 A Q 12
2 A Z 10
3 B Z 14
4 A Z 20
5 B W 20
I tried this query:
SELECT id,
CASE WHEN group = 'A' THEN ROW_NUMBER() OVER (PARTITION BY group ORDER BY SUM(value) DESC) AS rank_group
CASE WHEN group = 'A' AND subgroup = 'Z' THEN ROW_NUMBER() OVER (PARTITION BY group, subgroup ORDER BY SUM(value) DESC) AS rank_subgroup
FROM table
GROUP BY group, subgroup
But ended up with something like this:
id rank_group rank_subgroup
1 1 1
1 2 2
I would like to get each distinct id and return the rank based on the conditions of the case statement, but it looks like adding the needed partition causes a multiplication as the group by is necessary. I could write individual queries for each column, but I'd like to avoid if possible.
Do you want something like this?
select t.*,
dense_rank() over (order by sumg, group),
dense_rank() over (partition by group order by sumsg, subg),
from (select t.*,
sum(value) over (partition by group) as sumg,
sum(value) over (partition by group, subgroup) as sumsg
from t
) t;
This is my best guess at interpreting what you might want.

Finding consecutive patterns (with SQL)

A table consecutive in PostgreSQL:
Each se_id has an idx
from 0 up to 100 - here 0 to 9.
The search pattern:
SELECT *
FROM consecutive
WHERE val_3_bool = 1
AND val_1_dur > 4100 AND val_1_dur < 5900
Now I'm looking for the longest consecutive appearance of this pattern
for each p_id - and the AVG of the counted val_1_dur.
Is it possible to calculate this in pure SQL?
table as txt
"Result" as txt
One method is the difference of row numbers approach to get the sequences for each:
select pid, count(*) as in_a_row, sum(val1_dur) as dur
from (select t.*,
row_number() over (partition by pid order by idx) as seqnum,
row_number() over (partition by pid, val3_bool order by idx) as seqnum_d
from consecutive t
) t
group by (seqnun - seqnum_d), pid, val3_bool;
If you are looking specifically for "1" values, then add where val3_bool = 1 to the outer query. To understand why this works, I would suggest that you stare at the results of the subquery, so you can understand why the difference defines the consecutive values.
You can then get the max using distinct on:
select distinct on (pid) t.*
from (select pid, count(*) as in_a_row, sum(val1_dur) as dur
from (select t.*,
row_number() over (partition by pid order by idx) as seqnum,
row_number() over (partition by pid, val3_bool order by idx) as seqnum_d
from consecutive t
) t
group by (seqnun - seqnum_d), pid, val3_bool;
) t
order by pid, in_a_row desc;
The distinct on does not require an additional level of subquery, but I think that makes the logic clearer.
There are Window Functions, that enable you to compare one line with the previous and next one.
https://community.modeanalytics.com/sql/tutorial/sql-window-functions/
https://www.postgresql.org/docs/current/static/tutorial-window.html
As seen on How to compare the current row with next and previous row in PostgreSQL? and Filtering by window function result in Postgresql

Get aggregate over n last values in vertica

We have table that has the columns dates,sales and item.
An item's price can be different at every sale, and we want to find the price of an item, averaged over its most recent 50 sales.
Is there a way to do this using analytical functions in Vertica?
For a popular item, all these 50 sales could be from this week. For another, we may need to have a 3 month window.
Can we know what these windows are, per item ?
You would use a window-frame clause to get the value on every row:
select t.*,
avg(t.price) over (partition by item
order by t.date desc
rows between 49 preceding and current row
) as avg_price_50
from t;
On re-reading the question, I suspect you want a single row per item. For that, use row_number():
select t.item, avg(t.price)
from (select t.*,
row_number() over (partition by item order by t.date desc) as seqnum
from t
) t
where seqnum <= 50
group by item;