Window Function- Dense_Rank and Row_Number difference - sql

If I use a dense_rank window function below that works in giving me my output which is the transaction refunded at dates in ascending order and assigns it 1 as rank:
select p.billing_cycle_in_months, avg(t.days)
from (
select *,
datediff(day,transaction_settled_at, transaction_refunded_at) as days,
dense_rank() over (partition by signup_id order by transaction_settled_at asc) as rank
from transactions
) t
join signups s on s.signup_id = t.signup_id
join plans p on p.id = s.plan_id
where datediff(year,s.started_at, current_date) > 1 and t.rank = 1
group by p.billing_cycle_in_months
Would I essentially get same result as using row_number window function ranked over same date (transaction_settled_at asc) column?
Basically grouped by billing cycle I want to rank the earliest day as 1, just wanted to clairfy that in this case row_number would give me same result?
Thanks

In your query, the difference between using dense_rank() and row_number() is that the former allows top ties, while the latter does not.
So if two (or more) records have the same, earliest, transaction_settled_at for a given signup_id, then condition dense_rank() ... = 1 will keep them both, while row_number() will select an undefined record out of the two.
If there no risk of ties, both functions will in your context produce the same resulting dataset.
To reduce the possibility of ties, you can also add additional sorting criterias to the order by clause of the window function:
dense_rank() over (
partition by signup_id
order by transaction_settled_at, some_other_column desc, some_more_column
)

Related

sql keep the first row according to an attribute

I wrote an sql to group by the SEASON in ASC and CHAM in DESC. How can I select the largest value of CHAM in each SEASON? To further development, I need to keep the CHAMPION_ID.
Generally, you can do this with window functions, but your database, MariaDB v5.5, is too old to support window functions and a bunch of other things. I'd recommend either upgrading MariaDB or doing your work on dbfiddle or using the stand-alone SQLite.
You can do this without window functions using a subquery.
select *
from champs as c1
where cham = (
select max(cham)
from champs as c2
where c1.season = c2.season
)
Pick only the rows whose cham equals the highest cham for that season.
You might be tempted to use a group by, but this will not show duplicates. For example, spring is all ties.
-- Only shows one row per season.
select *
from champs
group by season
having cham = max(cham);
Try it.
With window functions...
Add the seasonal ranks. This is partitioned by season and ordering by cham. rank() will assign the same ranks to ties.
select
*,
rank() over (partition by season order by cham desc) as season_rank
from champs
Then use that as a common table expression and select only the rows with seasonal_rank = 1.
with ranked as (
select
*,
rank() over (partition by season order by cham desc) as season_rank
from champs
)
select *
from ranked
where season_rank = 1;
Try it.

Select one random row by group (Oracle 10g)

This post is similar to this thread in that I have multiple observations per group. However, I want to randomly select only one of them. I am also working on Oracle 10g.
There are multiple rows per person_id in table df. I want to order each group of person_ids by dbms_random.value() and select the first observation from each group. To do so, I tried:
select
person_id, purchase_date
from
df
where
row_number() over (partition by person_id order by dbms_random.value()) = 1
The query returns:
ORA-30483: window functions are not allowed here
30483. 00000 - "window functions are not allowed here"
*Cause: Window functions are allowed only in the SELECT list of a query. And, window function cannot be an argument to another window or group function.
Use a subquery:
select person_id, purchase_date
from (select df.*,
row_number() over (partition by person_id order by dbms_random.value()) as seqnum
from df
) df
where seqnum = 1;
One option would be using WITH..AS Clause :
WITH t AS
(
SELECT df.*,
ROW_NUMBER() OVER (PARTITION BY person_id ORDER BY dbms_random.value()) AS rn
FROM df
)
SELECT person_id, purchase_date
FROM t
WHERE rn = 1
Aggregate queries (using GROUP BY and aggregate functions) are much faster than equivalent analytic functions that do the same job. So, if you have a lot of data to process, or if the data is not excessively large but you must run this query often, you may want a more efficient query that uses aggregation instead of analytic functions.
Here is one possible approach:
select person_id,
max(purchase_date) keep (dense_rank first order by dbms_random.value())
as random_purchase_date
from df
group by person_id
;

can we get totalcount and last record from postgresql

i am having table having 23 records , I am trying to get total count of record and last record also in single query. something like that
select count(*) ,(m order by createdDate) from music m ;
is there any way to pull this out only last record as well as total count in PostgreSQL.
This can be done using window functions
select *
from (
select m.*,
row_number() over (order by createddate desc) as rn,
count(*) over () as total_count
from music
) t
where rn = 1;
Another option would be to use a scalar sub-query and combine it with a limit clause:
select *,
(select count(*) from order_test.orders) as total_count
from music
order by createddate desc
limit 1;
Depending on the indexes, your memory configuration and the table definition might be faster then the two window functions.
No, it's not not possible to do what is being asked, sql does not function that way, the second you ask for a count () sql changes the level of your data to an aggregation. The only way to do what you are asking is to do a count() and order by in a separate query.
Another solution using windowing functions and no subquery:
SELECT DISTINCT count(*) OVER w, last_value(m) OVER w
FROM music m
WINDOW w AS (ORDER BY date DESC RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING);
The point here is that last_value applies on partitions defined by windows and not on groups defined by GROUP BY.
I did not perform any test but I suspect my solution to be the less effective amongst the three already posted. But it is also the closest to your example query so far.

Rank Over Partition By in Oracle SQL (Oracle 11g)

I have 4 columns in a table
Company Part Number
Manufacturer Part Number
Order Number
Part Receipt Date
Ex.
I just want to return one record based on the maximum Part Receipt Date which would be the first row in the table (The one with Part Receipt date 03/31/2015).
I tried
RANK() OVER (PARTITION BY Company Part Number,Manufacturer Part Number
ORDER BY Part Receipt Date DESC,Order Number DESC) = 1
at the end of the WHERE statement and this did not work.
This would seem to do what you want:
select t.*
from (select t.*
from t
order by partreceiptdate desc
) t
where rownum = 1;
Analytic functions like rank() are available in the SELECT clause, they can't be invoked directly in a WHERE clause. To use rank() the way you want it, you must declare it in a subquery and then use it in the WHERE clause in the outer query. Something like this:
select company_part_number, manufacturer_part_number, order_number, part_receipt_date
from ( select t.*, rank() over (partition by... order by...) as rnk
from your_table t
)
where rnk = 1
Note also that you can't have a column name like company part number (with spaces in it) - at least not unless they are enclosed in double-quotes, which is a very poor practice, best avoided.

SQL Query with MIN function is Returning Multiple Rows

I'm trying to select the estimated hours of a row with the lowest date from a table.
SELECT prev_est_hrs
FROM ( SELECT MIN(change_date), prev_est_hrs
FROM task_history
WHERE task_id = 5
GROUP BY prev_est_hrs
);
However this is returning two rows, why? I thought MIN was supposed to return the lowest only?
Help much appreciated.
You have a GROUP BY clause. The MIN will return the minimum value in each group.
Also, you are only returning the group by value from the outer SELECT.
Mitch is right. One way to get the prev_est_hrs for the record with the earliest change_date, which seems to be what you're trying to find, is with an analytic function:
SELECT prev_est_hrs
FROM (
SELECT prev_est_hrs, ROW_NUMBER() OVER (ORDER BY change_date) AS rn
FROM task_history
WHERE task_id = 5
)
WHERE rn = 1;
You need to consider what should happen if you have two rows with the same date. This would pick one of them at random. If there is some other criteria you could use to break the tie you could add that to the order by clause. If you wanted all matching rows in that case you could use rank() instead; look at dense_rank() as well, they all have their place.
Use this query to do that:
SELECT max(prev_est_hrs) keep (dense_rank first order by change_date)
FROM task_history
WHERE task_id = 5;