finding the postcode that the user spent the most time at - hive

I have a table:
UserID
Postcode
Hours at postcode
I need to be able to find the one that the user spent most time at. I've tried a max function, then I thought about ordering by hours Desc and taking the top one, but I am not getting anywhere,
Can anyone help?

This will output two records if user spent equal max hours on two poscodes:
select
UserID,
Postcode,
Hours
from
(
select
UserID,
Postcode,
Hours,
dense_rank() over(partition by userId order by hours desc) rn
from (select --skip this subquery if hours already aggregated by user, postcode
UserID, Postcode, sum(Hours) hours
from table group by UserID, Postcode
) s
)s
where rn = 1;
Use row_number() over(partition by userId order by hours desc) rn instead of dense_rank() if you need single record per user.

Related

How to grab the last value in a column per user for the last date

I have a table that contains three columns: ACCOUNT_ID, STATUS, CREATE_DATE.
I want to grab only the LAST status for each account_id based on the latest create_date.
In the example above, I should only see three records and the last STATUS per that account_2.
Do you know a way to do this?
create table TBL 1 (
account_id int,
status string,
create_date date)
select account_id, max(create_date) from table group by account_id;
will give you the account_id and create_date at the closest past date to today (assuming create_date can never be in the future, which makes sense).
Now you can join with that data to get what you want, something along the lines for example:
select account_id, status, create_date from table where (account_id, create_date) in (<the select expression from above>);
If you use that frequently (account with the latest create date), then consider defining a view for that.
If you have many columns and want keep the row that is the last line, you can use QUALIFY to run the ranking logic, and keep the best, like so:
SELECT *
FROM tbl
QUALIFY row_number() over (partition by account_id order by create_date desc) = 1;
The long form is the same pattern the Ely shows in the second answer. But with the MAX(CREATE_DATE) solution, if you have two rows on the same last day, the IN solution with give you both. you can also get via QUALIFY if you use RANK
So the SQL is the same as:
SELECT account_id, status, create_date
FROM (
SELECT *,
row_number() over (partition by account_id order by create_date desc) as rn
FROM tbl
)
WHERE rn = 1;
So the RANK for, which will show all equal rows is:
SELECT *
FROM tbl
QUALIFY rank() over (partition by account_id order by create_date desc) = 1;

Hive SQL: Find the last time a user had an entry

I am stuck a bit! I have a users table. The users get a score, but it doesn't come every day.
I need a way to show the score for the user for the last date that they got a score. It could be 1 month ago and I have 50M rows per day, so i can't just ingest all the partitions
Any idea how I can do this?
select userid, score from user_table where dt = 20201206
Get the most recent record as below:
select userid, score
from
(select userid, score, row_number() over (partition by userid order by dt desc) as rn
from user_table)
where rn = 1

How to rank on aggregated sum in Postgresql?

I want to rank by aggregated points. Example: A guessing game. Day 1: Person A guesses and gets 10 points, person B guesses and gets 9 points. Day 2: Person A gets 5 points, Person B gets 9.
What I want to get is:
On Day 2, Person A has an aggregated amount of 15 points and ranks 2.
Here's the basic table guesses:
id, person, points, day
1, thomas, 10, 1
2,thomas,5,2
3,marie,9,1
4,marie,9,2
I'm having no problems getting the aggregated points grouped by day:
SELECT
*,
sum(points) OVER (PARTITION BY person ORDER BY id) AS total_running_points,
FROM
guesses
ORDER BY
day asc;
But now I need to rank on every day.
I tried with the following but failed as of course total_running_points is a new alias:
SELECT
*,
sum(points) OVER (PARTITION BY person ORDER BY id) AS total_running_points,
rank() OVER (ORDER BY total_running_points desc)
FROM
bets_by_day
ORDER BY
day asc;
I sense that I should use a subquery but then I wonder how to partition on it.
How can I solve this?
You can use a subquery:
SELECT b.*, rank() over (order by total_running_points desc) rnk
FROM (
SELECT b.*, sum(points) over (partition by person order by id) AS total_running_points
FROM bets_by_day b
) b
ORDER BY day asc;

How to get the max value for each group in Oracle?

I've found some solutions for this problem, however, they don't seem to work with Oracle.
I got this:
I want a view to present only the informations about the oldest person for each team. So, my output should be something like this:
PERSON | TEAM | AGE
Sam | 1 | 23
Michael | 2 | 21
How can I do that in Oracle?
Here is an example without keep but with row_number():
with t0 as
(
select person, team, age,
row_number() over(partition by team order by age desc) as rn
from t
)
select person, team, age
from t0
where rn = 1;
select * from table
where (team, age) in (select team, max(age) from table group by team)
One method uses keep:
select team, max(age) as age,
max(person) keep (dense_rank first order by age desc) as person
from t
group by team;
There are other methods, but in my experience, keep works quite well.
select * from (select person,team,age,
dense_rank() over (partition by team order by age desc) rnk)
where rnk=1;
Using an Analytic Function returns all people with maximum age per team (needed if there are people with identical ages), selects Table one time only and is thus faster than other solutions that reference Table multiple times:
With MaxAge as (
Select T.*, Max (Age) Over (Partition by Team) MaxAge
From Table T
)
Select Person, Team, Age From MaxAge Where Age=MaxAge
;
This also works in MySQL/MariaDB.

Getting Unique ID foe a maximum amount grouped by days in BigQuery

I have this query in BigQuery:
SELECT
ID,
max(amount) as money,
STRFTIME_UTC_USEC(TIMESTAMP(time), '%j') as day
FROM table
GROUP BY day
The console shows an error as it wants the ID to the group by clause but if i add ID in the group by it will get many ID for a specific day.
I want to print a unique ID with the maximum amount in a specific day.
For ex:
ID: 1 Money:123 Day:365
not clear from the question, but looks like you already have only one entry per given id for particular day. Assuming this, below query does what you need
SELECT id, amount, day
FROM (
SELECT
id, amount, day,
ROW_NUMBER() OVER(PARTITION BY day ORDER BY amount DESC) AS win
FROM dataset.table
)
WHERE win = 1
 
in case if above assumption is wrong (so you have multiple entries for the same id for same day), use below
SELECT id, amount, day
FROM (
SELECT id, amount, day,
ROW_NUMBER() OVER(PARTITION BY day ORDER BY amount DESC) AS win
FROM (
SELECT id, SUM(amount) AS amount, day
FROM dataset.table
GROUP BY id, day
)
)
WHERE win = 1