Rank function for date in Oracle SQL - sql

I have the following code for example:
SELECT id, order_day, purchase_id FROM d
customer_id and purchase_id are unique. Each customer_id could have multiple purchase_id. Assume every one has made at least 5 orders.
Now, I just want to pull the first 5 purchase IDs of each customers ID (this depends on the earliest dates of purchases). I want the result to look like this:
id | purchase_id | rank
-------------------------
A | WERFEW43 | 1
A | ERTGDSFV | 3
A | FDGRT45 | 2
A | BRTE4TEW | 4
A | DFGDV | 5
B | DSFSF | 1
B | CF345 | 2
B | SDFSDFSDFS | 4
I thought of Ranking order_day, but my knowledge is not good enough to pull this off.

select id,purchase_id, rank() over (order by order_day)
from d
you also can try dense_rank() over (order by order_day) and row_number() over (order by order_day) and choose which one will be more suitable for you

select *
from
( SELECT
id
,order_day
,purchase_id
,row_number() -- ranking
over (partition by id -- each customer
order by order_day) as rn -- based on oldest dates
FROM d
) as dt
where rn <= 5

Related

Select row A if a condition satisfies else select row B for each group

We have 2 tables, bookings and docs
bookings
booking_id | name
100 | "Val1"
101 | "Val5"
102 | "Val6"
docs
doc_id | booking_id | doc_type_id
6 | 100 | 1
7 | 100 | 2
8 | 101 | 1
9 | 101 | 2
10 | 101 | 2
We need the result like this:
booking_id | doc_id
100 | 7
101 | 10
Essentially, we are trying to get the latest record of doc per booking, but if doc_type_id 2 is present, select the latest record of doc type 2 else select latest record of doc_type_id 1.
Is this possible to achieve with a performance friendly query as we need to apply this in a very huge query?
You can do it with FIRST_VALUE() window function by sorting properly the rows for each booking_id so that the rows with doc_type_id = 2 are returned first:
SELECT DISTINCT booking_id,
FIRST_VALUE(doc_id) OVER (PARTITION BY booking_id ORDER BY doc_type_id = 2 DESC, doc_id DESC) rn
FROM docs;
If you want full rows returned then you could use ROW_NUMBER() window function:
SELECT booking_id, doc_id, doc_type_id
FROM (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY booking_id ORDER BY doc_type_id = 2 DESC, doc_id DESC) rn
FROM docs
) t
WHERE rn = 1;

Find Customers With 4 Consecutive Years of Giving (Including Gaps)

I have a table similar to below:
+------------+-----------+
| CustomerID | OrderYear |
+------------+-----------+
| 1 | 2012 |
| 1 | 2013 |
| 1 | 2014 |
| 1 | 2017 |
| 1 | 2018 |
| 2 | 2012 |
| 2 | 2013 |
| 2 | 2014 |
| 2 | 2015 |
| 2 | 2017 |
+------------+-----------+
How would I identify which CustomerIDs have 4 consecutive years of giving? (In the above, only customer 2.) As you can see, some records will have gaps in order years.
I started down the row of trying to utilize some combination of ROW_NUMBER/LAG/LEAD with no luck to this point.
Very paired down/modified attempt...
WITH CTE
AS
(
SELECT T.ConstituentLookupID,
T.FISCALYEAR,
COUNT(T.FISCALYEAR) OVER (PARTITION BY T.ConstituentLookupID) AS
YearCount,
FIRST_VALUE(T.FISCALYEAR) OVER(PARTITION BY T.ConstituentLookupID ORDER
BY T.FISCALYEAR DESC) - T.FISCALYEAR + 1 as X,
ROW_NUMBER() OVER(PARTITION BY T.ConstituentLookupID ORDER BY
T.FISCALYEAR DESC) AS RN
FROM #Temp AS T)
SELECT CTE.ConstituentLookupID,
CTE.FISCALYEAR,
CTE.YearCount,
CTE.X,
CTE.RN,
FROM CTE
WHERE CTE.YearCount >= 4 --Have at least 4 years of giving
AND CTE.X - CTE.RN = 1 --Some kind of way to calculate consecutive years. Doesnt account current year and gaps...;
Assuming no duplicates, you can use lag():
select distinct customerid
from (
select t.*,
lag(orderyear, 3) over(partition by customerid order by orderyear) oderyear3
from mytable t
) t
where orderyear = orderyear3 + 3
A more conventional approach is to use some gaps-and-islands technique. This is convenient if you want the start and end of each series. Here, an island is a series of rows with "adjacent" order years, and you want islands that are at least 4 years long. We can identify the islands by comparing the order year against an incrementing sequence, then use aggregation:
select customerid, min(orderyear) firstorderyear, max(orderyear) lastorderyear
from (
select t.*,
row_number() over(partition by customerid order by orderyear) rn
from mytable t
) t
group by customerid, orderyear - rn
having count(*) >= 4
Assuming you have no more than one row per customer and year, the simplest method is lag():
select customerid, year
from (select t.*,
lag(orderyear, 3) over (partition by customerid order by orderyear) as prev3_year
from t
) t
where prev3_year = year - 3;
The idea is to look 3 years back. If that year is year - 3, then there are four years in a row. If your data can have duplicates, there are tweaks to the logic (they make the query more slightly more complicated).
This could return duplicates, so you might just want:
select distinct customerid
from (select t.*,
lag(orderyear, 3) over (partition by customerid order by orderyear) as prev3_year
from t
) t
where prev3_year = year - 3;
I have a simple solution using row number and group by
SELECT Max(z.customerid),
Count(z.grp)
FROM (SELECT customerid,
orderyear,
orderyear - Row_number()
OVER (
ORDER BY customerid) AS Grp
FROM mytable)z
GROUP BY z.grp
HAVING Count(z.grp) = 4

Order By Id and Limit Offset By Id from a table

I have an issue similar to the following query:
select name, number, id
from tableName
order by id
limit 10 offset 5
But in this case I only take the 10 elements from the group with offset 5
Is there a way to set limit and offset by id?
For example if I have a set:
|------------------------------------|---|---------------------------------------|
| Ana | 1 | 589d0011-ef54-4708-a64a-f85228149651 |
| Jana | 2 | 589d0011-ef54-4708-a64a-f85228149651 |
| Jan | 3 | 589d0011-ef54-4708-a64a-f85228149651 |
| Joe | 2 | 64ed0011-ef54-4708-a64a-f85228149651 |
and if I have skip 1 I should get
|------------------------------------|---|---------------------------------------|
| Jana | 2 | 589d0011-ef54-4708-a64a-f85228149651 |
| Jan | 3 | 589d0011-ef54-4708-a64a-f85228149651 |
I think that you want to filter by row_number():
select name, number, id
from (
select t.*, row_number() over(partition by name order by id) rn
from mytable t
) t
where
rn >= :number_of_records_per_group_to_skip
and rn < :number_of_records_per_group_to_skip + :number_of_records_per_group_to_keep
The query ranks records by id withing groups of records having the same name, and then filters using two parameters:
:number_of_records_per_group_to_skip: how many records per group should be skipped
:number_of_records_per_group_to_skip: how many records per group should be kept (after skipping :number_of_records_per_group_to_skip records)
This might not be the answer you are looking for but it gives you the results your example shows:
select name, number, id
from (
select * from tableName
order by id
limit 3 offset 0
) d
where id > 1;
Best regards,
Bjarni

How to SELECT in SQL based on a value from the same table column?

I have the following table
| id | date | team |
|----|------------|------|
| 1 | 2019-01-05 | A |
| 2 | 2019-01-05 | A |
| 3 | 2019-01-01 | A |
| 4 | 2019-01-04 | B |
| 5 | 2019-01-01 | B |
How can I query the table to receive the most recent values for the teams?
For example, the result for the above table would be ids 1,2,4.
In this case, you can use window functions:
select t.*
from (select t.*, rank() over (partition by team order by date desc) as seqnum
from t
) t
where seqnum = 1;
In some databases a correlated subquery is faster with the right indexes (I haven't tested this with Postgres):
select t.*
from t
where t.date = (select max(t2.date) from t t2 where t2.team = t.team);
And if you wanted only one row per team, then the canonical answer is:
select distinct on (t.team) t.*
from t
order by t.team, t.date desc;
However, that doesn't work in this case because you want all rows from the most recent date.
If your dataset is large, consider the max analytic function in a subquery:
with cte as (
select
id, date, team,
max (date) over (partition by team) as max_date
from t
)
select id
from cte
where date = max_date
Notionally, max is O(n), so it should be pretty efficient. I don't pretend to know the actual implementation on PostgreSQL, but my guess is it's O(n).
One more possibility, generic:
select * from t join (select max(date) date,team from t
group by team) tt
using(date,team)
Window function is the best solution for you.
select id
from (
select team, id, rank() over (partition by team order by date desc) as row_num
from table
) t
where row_num = 1
That query will return this table:
| id |
|----|
| 1 |
| 2 |
| 4 |
If you to get it one row per team, you need to use array_agg function.
select team, array_agg(id) ids
from (
select team, id, rank() over (partition by team order by date desc) as row_num
from table
) t
where row_num = 1
group by team
That query will return this table:
| team | ids |
|------|--------|
| A | [1, 2] |
| B | [4] |

SQL Query: get the unique id/date combos based on latest dates - need speed improvement

Not sure how to title or ask this really. Say I am getting a result set like this on a join of two tables, one contains the Id (C), the other contains the Rating and CreatedDate (R) with a foreign key to the first table:
-----------------------------------
| C.Id | R.Rating | R.CreatedDate |
-----------------------------------
| 2 | 5 | 12/08/1981 |
| 2 | 3 | 01/01/2001 |
| 5 | 1 | 11/11/2011 |
| 5 | 2 | 10/10/2010 |
I want this result set (the newest ones only):
-----------------------------------
| C.Id | R.Rating | R.CreatedDate |
-----------------------------------
| 2 | 3 | 01/01/2001 |
| 5 | 1 | 11/11/2011 |
This is a very large data set, and my methods (I won't mention which so there is no bias) is very slow to do this. Any ideas on how to get this set? It doesn't necessarily have to be a single query, this is in a stored procedure.
Thank you!
You need a CTE with a ROW_NUMBER():
WITH CTE AS (
SELECT ID, Rating, CreatedDate, ROW_NUMBER() OVER (PARTITION BY ID ORDER BY CreatedDate DESC) RowID
FROM [TABLESWITHJOIN]
)
SELECT *
FROM CTE
WHERE RowID = 1;
You can use row_number():
select t.*
from (select t.*,
row_number() over (partition by id order by createddate desc) as seqnum
from table t
) t
where seqnum = 1;
If you are using SQL Server 2008 or later, you should consider using windowing functions. For example:
select ID, Rating, CreatedDate from (
select ID, Rating, CreatedDate,
rowseq=ROW_NUMBER() over (partition by ID order by CreatedDate desc)
from MyTable
) x
where rowseq = 1
Also, please understand that while this is an efficient query in and of itself, your overall performance depends even more heavily on the underlying tables and, in particular, the indexes and explain plans that are used when joining the tables in the first place, etc.