Creating a Rank Column with Repeated Indexes - sql

I want to output the following table:
User | Country | RANK
------------------------------
1 US 3
1 US 3
1 NZ 2
1 NZ 2
1 NZ 2
1 JP 1
2 US 2
2 US 2
2 US 2
2 CA 1
What I have is the 'User' and 'Country' columns and want to create the RANK column.
I tried to use the function rank() like
rank() over (partition by User, Country order by ct desc) where ct is just the time of the event since epoch but instead of giving some repeated numbers like 33 222 1, it ranks inside the partition, giving me 12 123 1.
I also tried row_number() with no success.
If I use rank() over (partition by User order by country desc) it works, but how can I guarantee that it also ranks by ct?
Any clues on how to do that?

You are quite vague about the schema of your data. But assuming you have data that looks like this:
User Country Unix_time(epoch)
1 US 1437888888
1 NZ 1437666666
2 US 1437777777
2 NZ 1435555555
I think this will work but I can't test as I don't have hive on my laptop.
select c.*, b.rank
from my_table c
left outer join
(select user
, country
, rank() over (partition by user, order by unix_time desc) as rank
from
(select user, country, max(unix_time) as unix_time
from my_table group by user, country
) a
) b
on c.user=b.user and c.country=b.country
;
Basically I am selecting the maximum value for the time stamp associated with each user and country. This can then be ranked and joined to the original dataset.

Related

How to get rank of a user from all users

I have table called summary_coins , By ranking of coins I am trying to get an user ranking
I have tried like below
SELECT
user_id,
sum(get_count),
rank() over (order by sum(get_count) asc) as rank
FROM summary_coins
WHERE user_id = 2
GROUP BY user_id
sample data , without user_id = 2 in where I am getting below list
user_id sum rank
44 2 1
13 4 2
57 4 2
47 4 2
11 5 5
2 5 5
My desire out put :
2 5 5
Here I am always getting ranking 1 for user ID 2 , But from list of user it should be rank 5.
You want to apply WHERE user_id = 2 late. RANK OVER is the last thing to happen in your query, but you want to apply the WHERE clause afterwards. In order to do this make your query a subquery you select from:
SELECT user_id, sum_count, rank
FROM
(
SELECT
user_id,
sum(get_count) AS sum_count,
rank() over (order by sum(get_count) asc) as rank
FROM summary_coins
GROUP BY user_id
) all_users
WHERE user_id = 2;

How to get the values for every group of the top 3 types

I've got this table ratings:
id
user_id
type
value
0
0
Rest
4
1
0
Bar
3
2
0
Cine
2
3
0
Cafe
1
4
1
Rest
4
5
1
Bar
3
6
1
Cine
2
7
1
Cafe
5
8
2
Rest
4
9
2
Bar
3
10
3
Cine
2
11
3
Cafe
5
I want to have a table with a row for every pair (user_id, type) for the top 3 rated types through all users (ranked by sum(value) across the whole table).
Desired result:
user_id
type
value
0
Rest
4
0
Cafe
1
0
Bar
3
1
Rest
4
1
Cafe
5
1
Bar
3
2
Rest
4
3
Cafe
5
2
Bar
3
I was able to do this with two queries, one to get the top 3 and then another to get the rows where the type matches the top 3 types.
Does someone know how to fit this into a single query?
Get rows per user for the 3 highest ranking types, where types are ranked by the total sum of their value across the whole table.
So it's not exactly about the top 3 types per user, but about the top 3 types overall. Not all users will have rows for the top 3 types, even if there would be 3 or more types for the user.
Strategy:
Aggregate to get summed values per type (type_rnk).
Take only the top 3. (Break ties ...)
Join back to main table, eliminating any other types.
Order result by user_id, type_rnk DESC
SELECT r.user_id, r.type, r.value
FROM ratings r
JOIN (
SELECT type, sum(value) AS type_rnk
FROM ratings
GROUP BY 1
ORDER BY type_rnk DESC, type -- tiebreaker
LIMIT 3 -- strictly the top 3
) v USING (type)
ORDER BY user_id, type_rnk DESC;
db<>fiddle here
Since multiple types can have the same ranking, I added type to the sort order to break ties alphabetically by their name (as you did not specify otherwise).
Turns out, we don't need window functions - the ones with OVER and, optionally, PARTITION for this. (Since you asked in a comment).
I think you just want row_number(). Based on your results, you seem to want three rows per type, with the highest value:
select t.*
from (select t.*,
row_number() over (partition by type order by value desc) as seqnum
from t
) t
where seqnum <= 3;
Your description suggests that you might just want this per user, which is a slight tweak:
select t.*
from (select t.*,
row_number() over (partition by user order by value desc) as seqnum
from t
) t
where seqnum <= 3;

Postgresql query to filter latest data based on 2 columns

Table Structure First
users table
id
1
2
3
sites table
id
1
2
site_memberships table
site_id
user_id
created_on
1
1
1
1
1
2
1
1
3
2
1
1
2
1
2
1
2
2
1
2
3
Assuming higher the created_on number, latest the record
Expected Output
site_id
user_id
created_on
1
1
3
2
1
2
1
2
3
Expected output: I need latest record for each user for each site membership.
Tried the following query, but this does not seem to work.
select * from users inner join
(
SELECT ROW_NUMBER () OVER (
PARTITION BY sm.user_id,
sm.created_on
), sm.*
from site_memberships sm
inner join sites s on sm.site_id=s.id
) site_memberships
ON site_memberships.user_id = users.user_id where row_number=1```
I think you have overcomplicated the problem you want to solve.
You seem to want aggregation:
select site_id, user_id, max(created_on)
from site_memberships sm
group by site_id, user_id;
If you had additional columns that you wanted, you could use distinct on instead:
select distinct on (site_id, user_id) sm.*
from site_memberships sm
order by site_id, user_id, created_on desc;

A way to only select rows that indicate progression, and ignore rows that indicate recovery?

I have a dataset with thousands of patients that include their ID and their disease stage over time. The data is complicated because there are patients that get worse, then recover, then get worse again. I would like to only select rows from a patient that indicate disease progression.
For example, ID 1 progresses from 3 > 4, then recovers back to stage 1 before worsening again to stage 5. How can I ignore rows that indicate recovery, and only keep rows that indicate progression over time? Is this even possible using SQL? Thank you in advance!
What data looks like:
ID stage_date disease_stage
1 1-JAN-15 3
1 3-JAN-15 4
1 6-JAN-15 1
1 9-JAN-15 5
1 10-JAN-15 1
What I want:
ID stage_date disease_stage
1 1-JAN-15 3
1 3-JAN-15 4
1 9-JAN-15 5
If I understand correctly, you want the rows that match the cumulative maximum:
select t.*
from (select t.*,
max(disease_stage) over (partition by id order by disease_stage) as max_running_disease_stage
from t
) t
where max_running_disease_stage = disease_stage;
This will keep ties. If you don't want ties:
select t.*
from (select t.*,
max(disease_stage) over (partition by id
order by stage_date
rows between unbounded preceding and 1 preceding
) as max_running_disease_stage
from t
) t
where max_running_disease_stage is null or
disease_stage > max_running_disease_stage;

SQL - Order by amount of occurrences

It's my first question here so I hope I can explain it well enough,
I want to order my data by amount of occurrences in the table.
My table is like this:
id Daynr
1 2
1 4
2 4
2 5
2 6
3 1
4 2
4 5
And I want it to sort it like this:
id Daynr
3 1
1 2
1 4
4 2
4 5
2 4
2 5
2 6
Player #3 has one day in the table, and Player #1 has 2.
My table is named "dayid"
Both id and Daynr are foreign keys, together making it a primary key
I hope this explains my problem enough, Please ask for more information it's my first time here.
Thanks in advance
You can do this by counting the number of times that things occur for each id. Most databases support window functions, so you can do this as:
select id, daynr
from (select t.*, count(*) over (partition by id) as cnt
from table t
) t
order by cnt, id;
You can also express this as a join:
select t.id, t.daynr
from table as t inner join
(select id, count(*) as cnt
from table
group by id
) as tg
on t.id = tg.id
order by tg.cnt, id;
Note that both of these include the id in the order by. That way, if two ids have the same count, all rows for the id will appear together.