Hive SQL: Find the last time a user had an entry - sql

I am stuck a bit! I have a users table. The users get a score, but it doesn't come every day.
I need a way to show the score for the user for the last date that they got a score. It could be 1 month ago and I have 50M rows per day, so i can't just ingest all the partitions
Any idea how I can do this?
select userid, score from user_table where dt = 20201206

Get the most recent record as below:
select userid, score
from
(select userid, score, row_number() over (partition by userid order by dt desc) as rn
from user_table)
where rn = 1

Related

Select only 3 rows per user - SQL Query

These are the columns in my table
id (autogenerated)
created_user
created_date
post_text
This table has lot of values. I wanted to take latest 3 posts of every created_user
I am new to SQL and need help. I ran the below query in my Postgres database and it is not helpful
SELECT * FROM posts WHERE created_date IN
(SELECT MAX(created_date) FROM posts GROUP BY created_date)
You could use the row_number() window function to create an ordered row count per user. After that you can easily filter by this value
demo:db<>fiddle
SELECT
*
FROM (
SELECT
*,
row_number() OVER (PARTITION BY created_user ORDER BY created_date DESC)
FROM
posts
) s
WHERE row_number <= 3
PARTITION BY groups the users
ORDER BY date DESC orders the posts of each user into a descending order to get the most recent as row_count == 1, ...

finding the postcode that the user spent the most time at

I have a table:
UserID
Postcode
Hours at postcode
I need to be able to find the one that the user spent most time at. I've tried a max function, then I thought about ordering by hours Desc and taking the top one, but I am not getting anywhere,
Can anyone help?
This will output two records if user spent equal max hours on two poscodes:
select
UserID,
Postcode,
Hours
from
(
select
UserID,
Postcode,
Hours,
dense_rank() over(partition by userId order by hours desc) rn
from (select --skip this subquery if hours already aggregated by user, postcode
UserID, Postcode, sum(Hours) hours
from table group by UserID, Postcode
) s
)s
where rn = 1;
Use row_number() over(partition by userId order by hours desc) rn instead of dense_rank() if you need single record per user.

How to get first and last row against one foreign key from a table

I've a situation where there is one ticket history table. it saves all the actions done against a ticket. how to write a query which will return the first record and the last record against specific ticket.
for example in the above table I've one ticket with id 78580. I want to get the first row and last row based on date column.
Just use row_number():
select t.*
from (select t.*,
row_number() over (partition by ticket_id order by action_when asc) as seqnum_a,
row_number() over (partition by ticket_id order by action_when desc) as seqnum_d
from tickets t
) t
where seqnum_a = 1 or seqnum_d = 1;
Use min and max to get first and last date, grouped by ticket id.
SELECT ticket_id, min(action_when), max(action_when)
FROM table_name
GROUP BY ticket_id;

Oracle SQL query: Delete the oldest duplicate records from the table

Current table:
Users:
ID
name
Time
001
John
Aug 15
001
Coga
March 1
002
Pat
May 10
I need to write a query which will find the Persons with the same ID and deletes the oldest record.
I am able to find the oldest record, but how can I delete it within the same query?
SELECT ID, MIN(Time)
FROM Users
WHERE ID in (SELECT ID FROM USERS group by ID having count(ID) > 1)
group by ID;
Result:
ID
name
Time
001
Coga
March 1
When deleting, I need to delete the exact (oldest) record which has a specific ID and specific Time.
You could use this query:
delete
from users
where (id, time) in
(select id, time
from (select id, time,
row_number() over (partition by id order by time desc) as rn,
from users) sub
where rn > 1)
It will delete all "duplicates" for a certain person, except the most recent one. The idea is that when you number the occurrences of a certain id, from recent to old, only the records numbered with 1 should be kept.
Pseudo-column ROWID
As guigui42 mentioned in comments below, the Oracle specific pseudo-column rowid may give a further performance improvement. This would be certainly the case if you do not have an index starting with the id, time fields:
delete
from users
where rowid in
(select rowid
from (select rowid,
row_number() over (partition by id order by time desc) as rn,
from users) sub
where rn > 1)
Try with windows functions:
ROW_NUMBER will help you find the first one by ID (if there are duplicates by ID and TIME we'll select just one of them)
COUNT(*) OVER (PARTITION BY) will validate that there are duplicates by ID
DELETE USERS
WHERE EXISTS
(SELECT 1
FROM
(SELECT ID, TIME,
ROW_NUMBER() OVER (PARTITION BY ID ORDER BY TIME) AS RN,
COUNT(*) OVER (PARTITION BY ID) CN
FROM USERS) U
WHERE U.ID = USERS.ID
AND U.TIME = USERS.TIME
AND U.RN = 1 -- FIRST ONE WITH THAT ID
AND U.CN > 1 -- WE HAVE MORE THAN ONE
)
delete
from
USERS
where
U_TIME in (select
U_TIME
from
(select
U_ID,min(U_TIME) as U_TIME,count(rownum) as CNT
from
USERS
group by
U_ID)
where
CNT>1 )
;
If you have any primary key constraints let me know

Getting Unique ID foe a maximum amount grouped by days in BigQuery

I have this query in BigQuery:
SELECT
ID,
max(amount) as money,
STRFTIME_UTC_USEC(TIMESTAMP(time), '%j') as day
FROM table
GROUP BY day
The console shows an error as it wants the ID to the group by clause but if i add ID in the group by it will get many ID for a specific day.
I want to print a unique ID with the maximum amount in a specific day.
For ex:
ID: 1 Money:123 Day:365
not clear from the question, but looks like you already have only one entry per given id for particular day. Assuming this, below query does what you need
SELECT id, amount, day
FROM (
SELECT
id, amount, day,
ROW_NUMBER() OVER(PARTITION BY day ORDER BY amount DESC) AS win
FROM dataset.table
)
WHERE win = 1
 
in case if above assumption is wrong (so you have multiple entries for the same id for same day), use below
SELECT id, amount, day
FROM (
SELECT id, amount, day,
ROW_NUMBER() OVER(PARTITION BY day ORDER BY amount DESC) AS win
FROM (
SELECT id, SUM(amount) AS amount, day
FROM dataset.table
GROUP BY id, day
)
)
WHERE win = 1