How to Left Join SQL Subquery with Table - sql

Goal: Create a query that calculates the ratio of ids that have/don't have a particular attribute.
Table 1: events
Fields: event_id, event_name, user_id
Field event_id is unique key/index
Field event_name has 3 potential values, one of which is the one being inspected.
Field user_id is a foreign key from Table 2
Table 2: users
Fields: id (and a long list of other attributes that aren't pertinent)
To get the list of user_ids with the qualifying attribute, I created the following:
SELECT DISTINCT events.user_id AS viewing_ids
FROM events
WHERE event_name = 'view_user_profile'
As I would expect this provides the list of users that have the corresponding event_name associated with their user_id
The next part is where I'm getting mixed up. Yes, I could COUNT(DISTINCT the select to get the count ids that have the attribute 'view_user_profile' but that only provides half the answer. What I need to do is then Join that list with the full user_id list from the table users and then determine when the id exists or doesn't.
I'm thinking the initial SELECT needs to be
SELECT
(CASE WHEN viewers IS NULL THEN false
ELSE true END) AS has_viewed_profile
, COUNT(user_id) AS users
FROM
(SELECT DISTINCT events.user_id AS viewing_ids
FROM events
WHERE event_name = 'view_user_profile') viewers
LEFT JOIN
users
ON
??? = users.id
This is where I get lost, I don't have a column name for viewers...

I think this is what you want:
select count(e.user_id) / count(*) as view_ratio
from users u left join
(select distinct e.user_id
from events e
where e.event_name = 'view_user_profile'
) e
on e.user_id = u.id;

Related

SQL Server statement is not returning anything

I have two table call User_History and List_Of_Event.
I'm trying to get all the result from User_History table like this but I'm not sure why it's not showing me anything If my Event_ID column in User_History table is blank.
I'm just wondering how can I still get all the result even if my Event_Id column in User_History table is all blank/empty.
Select Event_Name As 'Event', GIN, GID, UPN, OneDrive, SharePoint, Mailbox, Event_Date, Extra
From List_of_Events E
Inner Join User_History H
on E.Event_Id = H.Event_Id
List_Of_Event have Event_Id (int) and Event_Name (varchar)
User_History have Event_Id(int) and other Varchar columns
then you need a left join to list of events table :
Select Event_Name As 'Event'
, GIN, GID, UPN
, OneDrive, SharePoint
, Mailbox, Event_Date, Extra
From User_History H
left Join List_of_Events E
on E.Event_Id = H.Event_Id

Join 2 tables on foreign key while using count() in SQL

So I have two tables: Please see the ER diagram here
I want to use SELECT to create one table with "name" from the USER table, "id" as the foreign key for the two tables, and the count of friend_id as the number of friends each user has.
Here is my code:
SELECT name, id, (SELECT count(friend_id) as number
FROM friend
GROUP BY user_id)
FROM user
ORDER BY number DESC
I'm wondering what's the problem with these lines. Thank you!
You can use a subquery to calculate the count.
SELECT name, id, COALESCE(f.Count, 0) AS friend_count
FROM user u
LEFT JOIN (
SELECT user_id, COUNT(DISTINCT friend_id) AS Count
FROM friend
GROUP BY user_id
) f ON f.user_id = u.id
ORDER BY friend_count DESC
I used a LEFT JOIN so that if a user doesn't have a row in friend, it will still return a row with a friend count of 0 (thanks to COALESCE). I also added a DISTINCT so that if the friend has duplicates the friend is counted only one, might not be necessary especially if you have a UNIQUE INDEX setup on columns user_id, friend_id
Just add where to find only one id and remove group by because you have only one id for one or more friends as your diagram says.
SELECT name, id, (SELECT count(friend_id) as number
FROM friend
WHERE user_id = user.id)
FROM user
ORDER BY number DESC
I think this will be correct for you puprose
CREATE TABLE #user(
id VARCHAR(22),
[name] VARCHAR(255),
)
CREATE TABLE #friend(
user_id VARCHAR(22),
friend_id VARCHAR(22)
)
SELECT name, id, (SELECT COALESCE(COUNT(friend_id), 0)
FROM #friend f
WHERE f.user_id = u.id
GROUP BY user_id) as number
FROM #user u
ORDER BY number DESC
--Same query with join:
SELECT u.[name], u.id, COALESCE(COUNT(f.friend_id),0) number
FROM #user u
LEFT JOIN #friend f ON f.user_id = u.id
GROUP BY u.[name], u.id
ORDER BY number

Postgres many to one relationship join multiple tables and select all rows, provided that at least one row matches some criterea

Suppose I have a schema something like
create table if not exists user (
id serial primary key,
name text not null
);
create table if not exists post (
id serial primary key,
user_id integer not null references user (id),
score integer not null
)
I want to run a query that selects a row from the user table by ID, and all the rows that reference it from the post table, provided that at least one row in the post table has a score of greater than some number n (e.g. 50). I'm not exactly sure how to do this though.
You can use window functions. Let me assume that post has a user_id column so the tables can be tied together:
select u.*
from user u join
(select p.*, max(score) over (partition by user_id) as max_score
from post p
) p
on p.user_id = u.id
where p.max_score > 50;
If you just wanted all scores, then aggregation with filtering might be sufficient:
select u.*, array_agg(p.score order by p.score desc)
from user u join
post p
) p
on p.user_id = u.id
group by u.id
having max(p.score) > 50;

INNER JOIN of pagevies, contacts and companies - duplicated entries

In short: 3 table inner join duplicates records
I have data in BigQuery in 3 tables:
Pageviews with columns:
timestamp
user_id
title
path
Contacts with columns:
website_user_id
email
company_id
Companies with columns:
id
name
I want to display all recorded pageviews and, if user and/or company is known, display this data next to pageview.
First, I join contact and pageviews data (SQL is generated by Metabase business intelligence tool):
SELECT
`analytics.pageviews`.`timestamp` AS `timestamp`,
`analytics.pageviews`.`title` AS `title`,
`analytics.pageviews`.`path` AS `path`,
`Contacts`.`email` AS `email`
FROM `analytics.pageviews`
INNER JOIN `analytics.contacts` `Contacts` ON `analytics.pageviews`.`user_id` = `Contacts`.`website_user_id`
ORDER BY `timestamp` DESC
It works as expected and I can see pageviews attributed to known contacts.
Next, I'd like to show pageviews of contacts with known company and which company is this:
SELECT
`analytics.pageviews`.`timestamp` AS `timestamp`,
`analytics.pageviews`.`title` AS `title`,
`analytics.pageviews`.`path` AS `path`,
`Contacts`.`email` AS `email`,
`Companies`.`name` AS `name`
FROM `analytics.pageviews`
INNER JOIN `analytics.contacts` `Contacts` ON `analytics.pageviews`.`user_id` = `Contacts`.`website_user_id`
INNER JOIN `analytics.companies` `Companies` ON `Contacts`.`company_id` = `Companies`.`id`
ORDER BY `timestamp` DESC
With this query I would expect to see only pageviews where associated contact AND company are known (just another column for company name). The problem is, I get duplicate rows for every pageview (sometimes 5, sometimes 20 identical rows).
I want to avoid selecting DISTINCT timestamps because it can lead to excluding valid pageviews from different users but with identical timestamp.
How to approach this?
Your description sounds like you have duplciates in companies. This is easy to test for:
select c.id, count(*)
from `analytics.companies` c
group by c.id
having count(*) >= 2;
You can get the details using window functions:
select c.*
from (select c.*, count(*) over (partition by c.id) as cnt
from `analytics.companies` c
) c
where cnt >= 2
order by cnt desc, id;

Select by frequency

I have two tables, like that:
users(id, name)
phones(user_id, number)
I'd like to select all user's names that are in more than three rows in the table phones. How can I do that?
Join the tables and add a having clause that limits the results returned by the count of the user_ids
select name,
count(user_id)
from users u
join phones p
on u.id = p.user_id
group by name
having count(user_id) > 3
SQL Fiddle: http://sqlfiddle.com/#!2/c5516/2
select name from user
join phones on id = user_id
Group By user_id
Having Count(number) > 3