SQL: Two queries in a single set of results? - sql

I'm using Postgres 9.6. I have three tables, like this:
Table public.user
id integer
name character varying
email character varying
Table public.project
id integer
user_id integer
Table public.sale
id integer
user_id integer
user_id is a foreign key in both the project and sale tables.
Is there a way I can get a list back of all user IDs with the number of projects and number of sales attached to them, as a single query?
So I'd like final data that looks like this:
user_id,num_projects,num_stories
121,28,1
122,43,6
123,67,2
I know how to do just the number of projects:
SELECT "user".id, COUNT(*) AS num_visualisations
JOIN project ON project.user_id="user".id
GROUP BY "user".id
ORDER BY "user".id DESC
But I don't know how also to get the number of sales too, in a single query.

Use subqueries for the aggregation and a left join:
select u.*, p.num_projects, s.num_sales
from user u left join
(select p.user_id, count(*) as num_projects
from projects p
group by p.user_id
) p
on p.user_id = u.id left join
(select s.user_id, count(*) as num_sales
from sales s
group by s.user_id
) s
on s.user_id = u.id;

Related

How to pull the count of occurences from 2 SQL tables

I am using python on a SQlite3 DB i created. I have the DB created and currently just using command line to try and get the sql statement correct.
I have 2 tables.
Table 1 - users
user_id, name, message_count
Table 2 - messages
id, date, message, user_id
When I setup table two, I added this statement in the creation of my messages table, but I have no clue what, if anything, it does:
FOREIGN KEY (user_id) REFERENCES users (user_id)
What I am trying to do is return a list containing the name and message count during 2020. I have used this statement to get the TOTAL number of posts in 2020, and it works:
SELECT COUNT(*) FROM messages WHERE substr(date,1,4)='2020';
But I am struggling with figuring out if I should Join the tables, or if there is a way to pull just the info I need. The statement I want would look something like this:
SELECT name, COUNT(*) FROM users JOIN messages ON messages.user_id = users.user_id WHERE substr(date,1,4)='2020';
One option uses a correlated subquery:
select u.*,
(
select count(*)
from messages m
where m.user_id = u.user_id and m.date >= '2020-01-01' and m.date < '2021-01-01'
) as cnt_messages
from users u
This query would take advantage of an index on messages(user_id, date).
You could also join and aggregate. If you want to allow users that have no messages, a left join is a appropriate:
select u.name, count(m.user_id) as cnt_messages
from users u
left join messages m
on m.user_id = u.user_id and m.date >= '2020-01-01' and m.date < '2021-01-01'
group by u.user_id, u.name
Note that it is more efficient to filter the date column against literal dates than applying a function on it (which precludes the use of an index).
You are missing a GROUP BY clause to group by user:
SELECT u.user_id, u.name, COUNT(*) AS counter
FROM users u JOIN messages m
ON m.user_id = u.user_id
WHERE substr(m.date,1,4)='2020'
GROUP BY u.user_id, u.name

LEFT JOIN discarding left rows in results?

Simplifying my issue, let's say I have two tables:
"Users" storing user_id and event_date from users who access each day.
"Purchases" storing user_id, event_date and product_id from users who make purchases each day.
I need to get from all users, their respective product purchases, or null value for product_id if a user didn't make a purchase. For that purpose I made this query:
with all_users as (
select user_id from `my_project.my_dataset.Users`
where event_date = "2019-12-01"
)
select user_id,product_id
from all_users
left join `my_project.my_dataset.Purchases`
using(user_id)
where event_date = "2019-12-01"
But this query returns only user_id who made purchases, in other words, there are rows in the LEFT from_item (all_users) that are being ommited in the result.
Is this working as spected? I read that LEFT JOIN always retains all rows of the left from_item.
EDIT 1:
Adding some screenshots:
This is the full query detailed before, but with real names (table "Users" is "user_metrics_daily" and table "Purchases" is "virtual_currency_daily"). As you can see, I added the count(distinct user_pseudo_id)OVER() to count how many distinct users are in the result.
In the other hand, this is a query to get the number of users I expect to have in the result (8935 users, with null values in product_id for users who don't purchase). But actually I got 2724 distinct users (the number of users who made purchases).
EDIT 2: I found a solution to my desired result, but still I don't understand what's wrong with my first query.
Your query (as it is) should return an error because user_id is ambiguous. BigQuery does not know if you want the column from all_users or my_project.my_dataset.Purchases.
Discarding that, you need to explicitly say from which table the projected columns should come from. In your case, user_id from all_users and product_id from my_project.my_dataset.Purchases.
with all_users as (
select user_id from `my_project.my_dataset.Users`
where event_date = "2019-12-01"
)
select
a.user_id,
p.product_id
from all_users as a
left join `my_project.my_dataset.Purchases` as p on a.user_id = p.user_id
where event_date = "2019-12-01"

join 2 foreign key using subquery

help me solve this, i am intended to join 2 table for 2 different foreign key within the same column, table snapshot provide below:
users table
transactions table
i want to return top 5 based on transactions amount from high-low alongside to display transactions id, investor id, investor name, borrower id, borrower name, amount
the following run properly but contains no investor name
select top 5 t.id,
investor_id,
borrower_id,
username as BorrowerName,
amount
from transactions t join users u on t.borrower_id = u.id
order by t.amount desc;
minus investor name result table
while if i do subquery resulting error
select top 5 t.id,
investor_id,
(select username from users join transactions on users.id =
transactions.investor_id) investorName,
borrower_id,
username BorrowerName,
amount
from transactions t join users u on t.borrower_id = u.id
order by t.amount desc;
select top 5 t.id,
investor_id, ui.username as InvestorName,
borrower_id, ub.username as BorrowerName,
amount
from transactions t
join users ub on t.borrower_id = ub.id
join users ui on t.investor_id = ui.id
order by t.amount desc;
The Subquery must be scalar. i.e. return a single value, but you currently return a result set.
select top 5 t.id,
investor_id,
(-- Correlated Scalar Subquery, returns a single value
select username
from users
WHERE users.id = transactions.investor_id) investorName,
borrower_id,
username BorrowerName,
amount
from transactions t join users u on t.borrower_id = u.id
order by t.amount desc;
Isn't this what you want? Two joins on users table
SELECT TOP 5
investor_id,
investors.username InvestorName,
borrower_id,
borrowers.username BorrowerName,
amount
FROM
transactions
INNER JOIN users investors ON (transactions.investor_id = investors.id)
INNER JOIN users borrowers ON (transactions.borrower_id = borrowers.id)
ORDER BY
amount desc;
I would recommend against using subqueries in this case, since the database will be forced to perform two sequential scans in a nested loop for each row.

Using SQL Aggregate Functions With Multiple Joins

I am attempting to use multiple aggregate functions across multiple tables in a single SQL query (using Postgres).
My table is structured similar to the following:
CREATE TABLE user (user_id INT PRIMARY KEY, user_date_created TIMESTAMP NOT NULL);
CREATE TABLE item_sold (item_sold_id INT PRIMARY KEY, sold_user_id INT NOT NULL);
CREATE TABLE item_bought (item_bought_id INT PRIMARY KEY, bought_user_id INT NOT NULL);
I want to count the number of items bought and sold for each user. The solution I thought up does not work:
SELECT user_id, COUNT(item_sold_id), COUNT(item_bought_id)
FROM user
LEFT JOIN item_sold ON sold_user_id=user_id
LEFT JOIN item_bought ON bought_user_id=user_id
WHERE user_date_created > '2014-01-01'
GROUP BY user_id;
That seems to perform all the combinations of (item_sold_id, item_bought_id), e.g. if there are 4 sold and 2 bought, both COUNT()s are 8.
How can I properly query the table to obtain both counts?
The easy fix to your query is to use distinct:
SELECT user_id, COUNT(distinct item_sold_id), COUNT(distinct item_bought_id)
FROM user
LEFT JOIN item_sold ON sold_user_id=user_id
LEFT JOIN item_bought ON bought_user_id=user_id
WHERE user_date_created > '2014-01-01'
GROUP BY user_id;
However, the query is doing unnecessary work. If someone has 100 items bought and 200 items sold, then the join produces 20,000 intermediate rows. That is a lot.
The solution is to pre-aggregate the results or use a correlated subquery in the select. In this case, I prefer the correlated subquery solution (assuming the right indexes are available):
SELECT u.user_id,
(select count(*) from item_sold s where u.user_id = s.sold_user_id),
(select count(*) from item_bought b where u.user_id = b.bought_user_id)
FROM user u
WHERE u.user_date_created > '2014-01-01';
The right indexes are item_sold(sold_user_id) and item_bought(bought_user_id). I prefer this over pre-aggregation because of the filtering on the user table. This only does the calculations for users created this year -- that is harder to do with pre-aggregation.
SQL Fiddle
With a lateral join it is possible to pre aggregate only the filtered users
select user_id, total_item_sold, total_item_bought
from
"user" u
left join lateral (
select sold_user_id, count(*) as total_item_sold
from item_sold
where sold_user_id = u.user_id
group by sold_user_id
) item_sold on user_id = sold_user_id
left join lateral (
select bought_user_id, count(*) as total_item_bought
from item_bought
where bought_user_id = u.user_id
group by bought_user_id
) item_bought on user_id = bought_user_id
where u.user_date_created >= '2014-01-01'
Notice that you need >= in the filter otherwise it is possible to miss the exact first moment of the year. Although that timestamp is unlikely with naturally entered data, it is common with an automated job.
Another way to solve this problem is to use two nested selects.
select user_id,
(select count(*) from item_sold where sold_user_id = user_id),
(select count(*) from item_bought where bought_user_id = user_id)
from user
where user_date_created > '2014-01-01'

Getting Counts Per User Across Tables in POSTGRESQL

I'm new to postgresql. I have a database that has three tables in it: Users, Order, Comments. Those three tables look like this
Orders Comments
------ --------
ID ID
UserID UserID
Description Details
CreatedOn CreatedOn
I'm trying to get a list of all of my users and how many orders each user has made and how many comments each user has made. In other words, the result of the query should look like this:
UserID Orders Comments
------ ------ --------
1 5 7
2 2 9
3 0 0
...
Currently, I'm trying the following:
SELECT
UserID,
(SELECT COUNT(ID) FROM Orders WHERE UserID=ID) AS Orders,
(SELECT COUNT(ID) FROM Comments WHERE UserID=ID) AS Comments
FROM
Orders o,
Comments c
WHERE
o.UserID = c.UserID
Is this the right way to do this type of query? Or can someone provide a better approach from a performance standpoint?
SQL Fiddle
select
id, name,
coalesce(orders, 0) as orders,
coalesce(comments, 0) as comments
from
users u
left join
(
select userid as id, count(*) as orders
from orders
group by userid
) o using (id)
left join
(
select userid as id, count(*) as comments
from comments
group by userid
) c using (id)
order by name
The usual way to do this is by using outer joins to the two other tables and then group by the id (and name)
select u.id,
u.name,
count(distinct o.id) as num_orders,
count(distinct c.id) as num_comments
from users u
left join orders o on o.userId = u.id
left join comments c on c.userId = u.id
group by u.id, u.name
order by u.name;
That might very well be faster than your approach. But Postgres' query optimizer is quite smart and I have seen situations where both solutions are essentially equal in performance.
You will need to test that on your data and also have a look at the execution plans in order to find out which one is more efficient.