Let's say that we have a users table, and a user can have many posts (posts have the user_id column).
I want to retrieve posts for the first 5 users, but only one post per user. So, at the end I want to have 5 posts, where each post belongs to a different user. How can I do that in SQL?
You should have two tables
Table : users ; Columns : users_id , user_name
Table2: posts ; Columns : post_id , post_description , users_id
And now to retrieve all user with one post for each
SELECT * FROM users as u
LEFT JOIN (SELECT * FROM posts LIMIT 1) as p on p.users_id = u.users_id
LIMIT 5 ORDER BY ASC
If you want to get the oldest post for each user
SELECT * FROM users as u
LEFT JOIN (
SELECT
MIN(post_id) as post_id ,
post_description ,
users_id FROM posts
) as p on p.users_id = u.users_id
LIMIT 5 ORDER BY ASC
And for latest post use MAX(post_id) instead of MIN(post_id)
Maybe it will help you:
select *
from users u
join posts p on p.idUser = u.id
and p.id = ( select max(id) from posts where p.id=u.id )
ORDER BY i.id LIMIT 5
This will give you the latest five postings (assuming a higher ID post is "newer"), forcing each result to be from a different user.
SELECT p.user_id, MAX(p.post_id) AS post_id
FROM posts AS p
GROUP BY p.user_id -- get unique users
ORDER BY p.post_id DESC -- sort results by post_id number, descending
LIMIT 5
With this method, if you need to load other user or post data, it would be best to load it separately after you get the desired user ids and post ids.
Related
So I have two tables: Please see the ER diagram here
I want to use SELECT to create one table with "name" from the USER table, "id" as the foreign key for the two tables, and the count of friend_id as the number of friends each user has.
Here is my code:
SELECT name, id, (SELECT count(friend_id) as number
FROM friend
GROUP BY user_id)
FROM user
ORDER BY number DESC
I'm wondering what's the problem with these lines. Thank you!
You can use a subquery to calculate the count.
SELECT name, id, COALESCE(f.Count, 0) AS friend_count
FROM user u
LEFT JOIN (
SELECT user_id, COUNT(DISTINCT friend_id) AS Count
FROM friend
GROUP BY user_id
) f ON f.user_id = u.id
ORDER BY friend_count DESC
I used a LEFT JOIN so that if a user doesn't have a row in friend, it will still return a row with a friend count of 0 (thanks to COALESCE). I also added a DISTINCT so that if the friend has duplicates the friend is counted only one, might not be necessary especially if you have a UNIQUE INDEX setup on columns user_id, friend_id
Just add where to find only one id and remove group by because you have only one id for one or more friends as your diagram says.
SELECT name, id, (SELECT count(friend_id) as number
FROM friend
WHERE user_id = user.id)
FROM user
ORDER BY number DESC
I think this will be correct for you puprose
CREATE TABLE #user(
id VARCHAR(22),
[name] VARCHAR(255),
)
CREATE TABLE #friend(
user_id VARCHAR(22),
friend_id VARCHAR(22)
)
SELECT name, id, (SELECT COALESCE(COUNT(friend_id), 0)
FROM #friend f
WHERE f.user_id = u.id
GROUP BY user_id) as number
FROM #user u
ORDER BY number DESC
--Same query with join:
SELECT u.[name], u.id, COALESCE(COUNT(f.friend_id),0) number
FROM #user u
LEFT JOIN #friend f ON f.user_id = u.id
GROUP BY u.[name], u.id
ORDER BY number
Suppose I have a schema something like
create table if not exists user (
id serial primary key,
name text not null
);
create table if not exists post (
id serial primary key,
user_id integer not null references user (id),
score integer not null
)
I want to run a query that selects a row from the user table by ID, and all the rows that reference it from the post table, provided that at least one row in the post table has a score of greater than some number n (e.g. 50). I'm not exactly sure how to do this though.
You can use window functions. Let me assume that post has a user_id column so the tables can be tied together:
select u.*
from user u join
(select p.*, max(score) over (partition by user_id) as max_score
from post p
) p
on p.user_id = u.id
where p.max_score > 50;
If you just wanted all scores, then aggregation with filtering might be sufficient:
select u.*, array_agg(p.score order by p.score desc)
from user u join
post p
) p
on p.user_id = u.id
group by u.id
having max(p.score) > 50;
Two tables posts and comments. posts has many comments (comments has post_id foreign key to posts id primary key)
posts
id | content
------------
comments
id | post_id | text | created_at
-------------------------------
I need all posts, its content, and latest comment (based on max(created_at), and its text.
I can get upto created_at using this
with comment_latest as (select
post_id,
max(created_at) as latest_commented_at
from comments
group by 1)
select
posts.id,
posts.content,
comment_latest.latest_commented_at
from posts
left join comment_latest on comment_latest.post_id = posts.id
order by posts.id desc
limit 10
But I want the text of the comment as well.
You can use the Postgres extension distinct on:
select distinct on (p.id) p.* c.*
from posts p left join
comments c
on p.id = c.post_id
order by p.id desc, c.created_at desc
limit 10;
This sorts the data by the order by clause, returning the first row based on the keys in the distinct on.
Maybe for some people it might look very simple, but I just cant get it.
My tables are:
CREATE TABLE USERS (user_ID number PRIMARY KEY, username varchar2(32), password varchar2(32));
CREATE TABLE VIDEOS (video_ID number PRIMARY KEY, title varchar(64), description varchar(128));
CREATE TABLE VIEWS (view_ID number PRIMARY KEY, user_ID number, video_ID number);
CREATE TABLE FAVORITES (fav_ID number PRIMARY KEY, user_ID number, video_ID number);
I ve created those separated queries:
SELECT u.username AS "Username", count(*) AS "Views"
FROM Views v, Videos vd, Users u
WHERE v.user_id = u.user_id
AND v.video_id = vd.video_id
GROUP BY u.username
SELECT u.username AS "Username", count(*) AS "Favorites"
FROM Favorites f, Videos vd, Users u
WHERE f.user_id = u.user_id
AND f.video_id = vd.video_id
GROUP BY u.username
And I want a query to show something like that in only one simple query:
Username Views Favorites
-------------------------------
Person1 12 1
Person2 234 21
...
I Googled bunch of similar questions but I couldnt make any of them to work.
So any help is greatly appreciated.
You are progressing on the right track. -> You got two queries and you wish to see them together. You could perform a full outer join to get your results you are looking for as below.
with fave
as (
SELECT u.username AS "Username"
, count(*) AS "Favorites"
FROM Favorites f
JOIN Videos vd
ON f.video_id = vd.video_id
JOIN Users u
ON f.user_id = u.user_id
GROUP BY u.username
)
,views
as (SELECT u.username AS "Username"
, count(*) AS "Views"
FROM Views v
JOIN Videos vd
ON v.video_id = vd.video_id
JOIN Users u
ON v.user_id = u.user_id
GROUP BY u.username
)
select isnull(f.username,v.username) as username
,f.favourites
,v.views
from fave f
full outer join views v
on f.username=v.username
Since you know your data better, you could optimize the query further. Eg: it could be a rule that user who has set a favourite would also have viewed the video. If this is true then you can write a better query to optimize the dataset in a single block, instead of two blocks using full outer join
Aggregate separately in Views:
select user_id, count(*) counter
from Views
group by user_id
and Favorites
select user_id, count(*) counter
from Favorites
group by user_id
and finally LEFT join Users to the above queries:
select u.username,
coalesce(v.counter, 0) Views,
coalesce(f.counter, 0) Favorites
from users u
left join (
select user_id, count(*) counter
from Views
group by user_id
) v on v.user_id = u.user_id
left join (
select user_id, count(*) counter
from Favorites
group by user_id
) f on f.user_id = u.user_id
I used LEFT joins because there may exist users that did not see any video or do not have any favorites. In any of these cases COALESCE() will return 0 instead of null.
The table Videos is not needed.
In short: 3 table inner join duplicates records
I have data in BigQuery in 3 tables:
Pageviews with columns:
timestamp
user_id
title
path
Contacts with columns:
website_user_id
email
company_id
Companies with columns:
id
name
I want to display all recorded pageviews and, if user and/or company is known, display this data next to pageview.
First, I join contact and pageviews data (SQL is generated by Metabase business intelligence tool):
SELECT
`analytics.pageviews`.`timestamp` AS `timestamp`,
`analytics.pageviews`.`title` AS `title`,
`analytics.pageviews`.`path` AS `path`,
`Contacts`.`email` AS `email`
FROM `analytics.pageviews`
INNER JOIN `analytics.contacts` `Contacts` ON `analytics.pageviews`.`user_id` = `Contacts`.`website_user_id`
ORDER BY `timestamp` DESC
It works as expected and I can see pageviews attributed to known contacts.
Next, I'd like to show pageviews of contacts with known company and which company is this:
SELECT
`analytics.pageviews`.`timestamp` AS `timestamp`,
`analytics.pageviews`.`title` AS `title`,
`analytics.pageviews`.`path` AS `path`,
`Contacts`.`email` AS `email`,
`Companies`.`name` AS `name`
FROM `analytics.pageviews`
INNER JOIN `analytics.contacts` `Contacts` ON `analytics.pageviews`.`user_id` = `Contacts`.`website_user_id`
INNER JOIN `analytics.companies` `Companies` ON `Contacts`.`company_id` = `Companies`.`id`
ORDER BY `timestamp` DESC
With this query I would expect to see only pageviews where associated contact AND company are known (just another column for company name). The problem is, I get duplicate rows for every pageview (sometimes 5, sometimes 20 identical rows).
I want to avoid selecting DISTINCT timestamps because it can lead to excluding valid pageviews from different users but with identical timestamp.
How to approach this?
Your description sounds like you have duplciates in companies. This is easy to test for:
select c.id, count(*)
from `analytics.companies` c
group by c.id
having count(*) >= 2;
You can get the details using window functions:
select c.*
from (select c.*, count(*) over (partition by c.id) as cnt
from `analytics.companies` c
) c
where cnt >= 2
order by cnt desc, id;