Select only 3 rows per user - SQL Query - sql

These are the columns in my table
id (autogenerated)
created_user
created_date
post_text
This table has lot of values. I wanted to take latest 3 posts of every created_user
I am new to SQL and need help. I ran the below query in my Postgres database and it is not helpful
SELECT * FROM posts WHERE created_date IN
(SELECT MAX(created_date) FROM posts GROUP BY created_date)

You could use the row_number() window function to create an ordered row count per user. After that you can easily filter by this value
demo:db<>fiddle
SELECT
*
FROM (
SELECT
*,
row_number() OVER (PARTITION BY created_user ORDER BY created_date DESC)
FROM
posts
) s
WHERE row_number <= 3
PARTITION BY groups the users
ORDER BY date DESC orders the posts of each user into a descending order to get the most recent as row_count == 1, ...

Related

How do I select 1 [oldest] row per group of rows, given multiple groups?

Let's say we have the database table below, called USER_JOBS.
I'd like to write an SQL query that reflects this algorithm:
Divide the whole table in groups of rows defined by a common USER_ID (in the example table, the 2 resulting groups are colored yellow & green)
From each group, select the oldest row (according to SCHEDULE_TIME)
From this example table, the desired SQL query would return these 2 rows:
You can use ranking function (supported in most RDBS):
SELECT *
FROM
(
SELECT *
,ROW_NUMBER() OVER (PARTITION BY USER_ID ORDER BY SCHEDULE_TIME DESC) AS RowID
FROM [table]
)
WHERE RowID = 1
WITH Ranked AS (
SELECT
RANK() OVER (PARTITION BY User_ID ORDER BY ScheduleTime DESC) as Ranking,
*
FROM [table_name]
)
SELECT Status, Sob_Type, User_ID, TimeStamp FROM ranking WHERE Ranks = 1;

SQL Query to get the last unique value from a column

I have a SQL table with the following columns:
id | created_at
I want to get a dictionary of all the unique id's and the latest created_at time.
My current query is this:
SELECT id, created_at from uscis_case_history
ORDER BY created_at DESC
LIMIT 1
This gives me the latest id, but i want one for all unique id
Sounds like you just need to aggregate:
SELECT id, MAX(created_at) LatestCreated_at
FROM uscis_case_history
GROUP BY id
ORDER BY MAX(created_at) DESC;
You can use row_number to index the dates based on their id and then filter by that:
SELECT *
FROM (SELECT id,
created_at,
row_number() over (partition by id order by created_at desc) rn
FROM tbl) a
WHERE rn=1
Here is an example on dbfiddle

How to grab the last value in a column per user for the last date

I have a table that contains three columns: ACCOUNT_ID, STATUS, CREATE_DATE.
I want to grab only the LAST status for each account_id based on the latest create_date.
In the example above, I should only see three records and the last STATUS per that account_2.
Do you know a way to do this?
create table TBL 1 (
account_id int,
status string,
create_date date)
select account_id, max(create_date) from table group by account_id;
will give you the account_id and create_date at the closest past date to today (assuming create_date can never be in the future, which makes sense).
Now you can join with that data to get what you want, something along the lines for example:
select account_id, status, create_date from table where (account_id, create_date) in (<the select expression from above>);
If you use that frequently (account with the latest create date), then consider defining a view for that.
If you have many columns and want keep the row that is the last line, you can use QUALIFY to run the ranking logic, and keep the best, like so:
SELECT *
FROM tbl
QUALIFY row_number() over (partition by account_id order by create_date desc) = 1;
The long form is the same pattern the Ely shows in the second answer. But with the MAX(CREATE_DATE) solution, if you have two rows on the same last day, the IN solution with give you both. you can also get via QUALIFY if you use RANK
So the SQL is the same as:
SELECT account_id, status, create_date
FROM (
SELECT *,
row_number() over (partition by account_id order by create_date desc) as rn
FROM tbl
)
WHERE rn = 1;
So the RANK for, which will show all equal rows is:
SELECT *
FROM tbl
QUALIFY rank() over (partition by account_id order by create_date desc) = 1;

Select the max timestamp in hive

I have a table-customer with two records in the different timestamp. I want to select the max timestamp records: 2014-08-15 15:54:07.379.
Select Customer_ID, Account_ID, max(ProcessTimeStamp)
from Customer
group by Customer_ID, Account_ID
I should get one record, but the actual result is two records.
How can I get the max ProcessTimeStamp records?
You can use windows function here.
Either you can use dense_rank() or row_num().
1.USING DENSE_RANK()
select customer_id,account_id,processTimeStamp
from (select *
,dense_rank() over(partition by customer_id order by processTimeStamp desc) as rank
from "your table"
) temp
where rank=1
2.USING ROW NUMBER
select customer_id,account_id,processTimeStamp
from (select *
,row_number() over(partition by customer_id order by processTimeStamp desc) as rank
from "your table"
) temp
where rank=1
BUT with row_number() each row will get a unique number and if there are duplicate records than row_number will give only the row where row number=1(in above mentioned case).

Query to select 2 most recent dated records per grouping

Using R, I want to grab the two most recently dated entries for each UserID, assuming there is 1 or more entries per UserID.
The key elements of my data would be an identifier (UserID), and a date, that is of type date.
Thank you.
In SQL Server, which has the ROW_NUMBER() analytic function, you can try this query:
SELECT t.UserID, t.date, ...other columns
FROM
(
SELECT UserID, date, ...other columns,
ROW_NUMBER() OVER (PARTITION BY UserID ORDER BY date DESC) rn
FROM yourTable
) t
WHERE t.rn <= 2