CTE Optimization - sql

I have a query involving CTEs. I just want to know if the following query can be optimized in anyway and if it can be, then what's the rationale behind the optimized version of it:
here is the query:
WITH A AS
(
SELECT
user_ID
FROM user
WHERE user_Date IS not NULL
),
B AS
(
SELECT
P.user_ID,
P.Payment_Type,
SUM(P.Payment_Amount) AS Total_Payment
FROM Payment P
JOIN A ON A.user_ID = P.user_ID
)
SELECT
user_ID,
Total_Payment_Amount
FROM B
WHERE Payment_Type = 'CR';

Your query should be using GROUP BY, as it seems you want to take a sum aggregate for each user_ID. From a performance point of view, you are introducing many subqueries, which aren't really necessary. We can write your query using a single join between the Payment and user tables.
SELECT
P.user_ID,
SUM(P.Payment_Amount) AS Total_Payment
FROM Payment P
INNER JOIN user A
ON A.user_ID = P.user_ID
WHERE
A.user_Date IS NOT NULL AND P.Payment_Type = 'CR'
GROUP BY
P.user_ID;

For Oracle, SQL Server, and Postgres, the query optimizers should have no problems finding the optimal query plan for this. You can find out what the database will do with EXPLAIN, usually.

Try the folowing query-:
select p.user_ID,
SUM(Payment_Amount) as Total_Payment_Amount
from Payment P join user u
on P.user_ID=u.user_ID
where Payment_Type = 'CR';
and u.user_Date IS not NULL
group by p.user_ID
SQL Server 2014

Related

SQL: How to Order By And Limit Via a Join

As an example, let's say I have the following query:
select
u.id,
u.name,
(select s.status from user_status s where s.user_id = u.id order by s.created_at desc limit 1) as status
from
user u
where
u.active = true;
The above query works great. It returns the most recent user status for the selected user. However, I want to know how to get the same result using a join on the user_status table, instead of using a sub-query. Is something like this possible?
I'm using PostgreSQL.
Thank you for any help you can give!
select u.id , b.status
from user u join user_status b on u.id= b.user_id
where u.active = true;
order by b.s.created_at desc limit 1
i think this work.
but in your code there is "b" which i did not know what it is.
JOIN syntax does not directly offer order or limit as options; so strictly speaking you cannot achieve what you want directly as a join. I believe the easiest way to resolve your question is to use a joined subquery, like this:
select
u.id
, u.name
, s.status
from user u
left join (
select
user_id
, status
, row_number() over(partition by user_id
order by created_at desc) as rn
from user_status s
) s on u.id = s.userid and s.rn = 1
where u.active = true;
Here the analytic function row_number() combined with the over() clause enables the subsequent join condition and s.rn=1 to take advantage of both ordering and limiting the joined rows via the calculation of the rn value.
nb a correlated subquery within the select clause (as used in the question's query) acts like a left join because it can return NULL. If that effect isn't needed or desired you can change to an inner join.
It is possible to move that subquery into a CTE, but unless there are compelling reasons to do so I prefer using the more traditional form seen above.
An alternative approach (for Postgres 9.3 or later) is to use a lateral join which is quite similar to the original subquery, but as it becomes part of the from clause is likely to be more efficient that using that subquery in the select clause.
select
u.id
, u.name
, s.status
from user u
left join lateral (
select user_status.status
from user_status
where user_status.user_id = u.id
order by user_status.created_at desc
limit 1
) s ON true
where u.active = true;
I ended up doing the following, which is working great for me:
select
u.id,
u.name,
us.status
from
user u
left join (
select
distinct on (user_id)
*
from
user_status
order by
user_id,
created_at desc
) as us on u.id = vs.user_id
where
u.active = true;
This is also more efficient than the query that I had in my question.
You can achieve it by converting the subquery into a with clause and use it in the join

Is this SQL query with an EXISTS the most performant way of returning my result?

I'm trying to find the following statistic: How many users have made at least one order
(yeah, sounds like homework .. but this is a simplistic example of my real query).
Here's the made up query
SELECT COUNT(UserId)
FROM USERS a
WHERE EXISTS(
SELECT OrderId
FROM ORDERS b
WHERE a.UserId = b.UserId
)
I feel like I'm getting the correct answers but I feel like this is an overkill and is inefficient.
Is there a more efficient way I can get this result?
If this was linq I feel like I want to use the Any() keyword....
It sounds like you just could use COUNT DISTINCT:
SELECT COUNT(DISTINCT UserId)
FROM ORDERS
This will return the number of distinct values of UserId appear in the table OrderId.
In response to sgeddes's comment, to ensure that UserId also appears in Users, simply do a JOIN:
SELECT COUNT(DISTINCT b.UserId)
FROM ORDERS b
JOIN USERS a
ON a.UserId = b.UserId
Select count(distinct u.userid)
From USERS u
Inner join ORDERS o
On o.userid = u.userid
Your query should be fine, but there are a few other ways to calculate the count:
SELECT COUNT(*)
FROM USERS a
WHERE UserId IN (
SELECT UserId
FROM ORDERS b
)
or
SELECT COUNT(DISTINCT UserID)
FROM USERS a
INNER JOIN ORDERS b ON a.UserID = b.UserID
The only way to know which is faster is to try each method and measure the performance.

Using SQL Aggregate Functions With Multiple Joins

I am attempting to use multiple aggregate functions across multiple tables in a single SQL query (using Postgres).
My table is structured similar to the following:
CREATE TABLE user (user_id INT PRIMARY KEY, user_date_created TIMESTAMP NOT NULL);
CREATE TABLE item_sold (item_sold_id INT PRIMARY KEY, sold_user_id INT NOT NULL);
CREATE TABLE item_bought (item_bought_id INT PRIMARY KEY, bought_user_id INT NOT NULL);
I want to count the number of items bought and sold for each user. The solution I thought up does not work:
SELECT user_id, COUNT(item_sold_id), COUNT(item_bought_id)
FROM user
LEFT JOIN item_sold ON sold_user_id=user_id
LEFT JOIN item_bought ON bought_user_id=user_id
WHERE user_date_created > '2014-01-01'
GROUP BY user_id;
That seems to perform all the combinations of (item_sold_id, item_bought_id), e.g. if there are 4 sold and 2 bought, both COUNT()s are 8.
How can I properly query the table to obtain both counts?
The easy fix to your query is to use distinct:
SELECT user_id, COUNT(distinct item_sold_id), COUNT(distinct item_bought_id)
FROM user
LEFT JOIN item_sold ON sold_user_id=user_id
LEFT JOIN item_bought ON bought_user_id=user_id
WHERE user_date_created > '2014-01-01'
GROUP BY user_id;
However, the query is doing unnecessary work. If someone has 100 items bought and 200 items sold, then the join produces 20,000 intermediate rows. That is a lot.
The solution is to pre-aggregate the results or use a correlated subquery in the select. In this case, I prefer the correlated subquery solution (assuming the right indexes are available):
SELECT u.user_id,
(select count(*) from item_sold s where u.user_id = s.sold_user_id),
(select count(*) from item_bought b where u.user_id = b.bought_user_id)
FROM user u
WHERE u.user_date_created > '2014-01-01';
The right indexes are item_sold(sold_user_id) and item_bought(bought_user_id). I prefer this over pre-aggregation because of the filtering on the user table. This only does the calculations for users created this year -- that is harder to do with pre-aggregation.
SQL Fiddle
With a lateral join it is possible to pre aggregate only the filtered users
select user_id, total_item_sold, total_item_bought
from
"user" u
left join lateral (
select sold_user_id, count(*) as total_item_sold
from item_sold
where sold_user_id = u.user_id
group by sold_user_id
) item_sold on user_id = sold_user_id
left join lateral (
select bought_user_id, count(*) as total_item_bought
from item_bought
where bought_user_id = u.user_id
group by bought_user_id
) item_bought on user_id = bought_user_id
where u.user_date_created >= '2014-01-01'
Notice that you need >= in the filter otherwise it is possible to miss the exact first moment of the year. Although that timestamp is unlikely with naturally entered data, it is common with an automated job.
Another way to solve this problem is to use two nested selects.
select user_id,
(select count(*) from item_sold where sold_user_id = user_id),
(select count(*) from item_bought where bought_user_id = user_id)
from user
where user_date_created > '2014-01-01'

Getting Counts Per User Across Tables in POSTGRESQL

I'm new to postgresql. I have a database that has three tables in it: Users, Order, Comments. Those three tables look like this
Orders Comments
------ --------
ID ID
UserID UserID
Description Details
CreatedOn CreatedOn
I'm trying to get a list of all of my users and how many orders each user has made and how many comments each user has made. In other words, the result of the query should look like this:
UserID Orders Comments
------ ------ --------
1 5 7
2 2 9
3 0 0
...
Currently, I'm trying the following:
SELECT
UserID,
(SELECT COUNT(ID) FROM Orders WHERE UserID=ID) AS Orders,
(SELECT COUNT(ID) FROM Comments WHERE UserID=ID) AS Comments
FROM
Orders o,
Comments c
WHERE
o.UserID = c.UserID
Is this the right way to do this type of query? Or can someone provide a better approach from a performance standpoint?
SQL Fiddle
select
id, name,
coalesce(orders, 0) as orders,
coalesce(comments, 0) as comments
from
users u
left join
(
select userid as id, count(*) as orders
from orders
group by userid
) o using (id)
left join
(
select userid as id, count(*) as comments
from comments
group by userid
) c using (id)
order by name
The usual way to do this is by using outer joins to the two other tables and then group by the id (and name)
select u.id,
u.name,
count(distinct o.id) as num_orders,
count(distinct c.id) as num_comments
from users u
left join orders o on o.userId = u.id
left join comments c on c.userId = u.id
group by u.id, u.name
order by u.name;
That might very well be faster than your approach. But Postgres' query optimizer is quite smart and I have seen situations where both solutions are essentially equal in performance.
You will need to test that on your data and also have a look at the execution plans in order to find out which one is more efficient.

Help with Complicated SELECT query

I have this SELECT query:
SELECT Auctions.ID, Users.Balance, Users.FreeBids,
COUNT(CASE WHEN Bids.Burned=0 AND Auctions.Closed=0 THEN 1 END) AS 'ActiveBids',
COUNT(CASE WHEN Bids.Burned=1 AND Auctions.Closed=0 THEN 1 END) AS 'BurnedBids'
FROM (Users INNER JOIN Bids ON Users.ID=Bids.BidderID)
INNER JOIN Auctions
ON Bids.AuctionID=Auctions.ID
WHERE Users.ID=#UserID
GROUP BY Users.Balance, Users.FreeBids, Auctions.ID
My problam is that it returns no rows if the UserID cant be found on the Bids table.
I know it's something that has to do with my
(Users INNER JOIN Bids ON Users.ID=Bids.BidderID)
But i dont know how to make it return even if the user is no on the Bids table.
You're doing an INNER JOIN, which only returns rows if there are results on both sides of the join. To get what you want, change your WHERE clause like this:
Users LEFT JOIN Bids ON Users.ID=Bids.BidderID
You may also have to change your SELECT statement to handle Bids.Burned being NULL.
If you want to return rows even if there's no matching Auction, then you'll have to make some deeper changes to your query.
My problam is that it returns no rows if the UserID cant be found on the Bids table.
Then INNER JOIN Bids/Auctions should probably be left outer joins. The way you've written it, you're filtering users so that only those in bids and auctions appear.
Left join is the simple answer, but if you're worried about performance I'd consider re-writing it a little bit. For one thing, the order of the columns in the group by matters to performance (although it often doesn't change the results). Generally, you want to group by a column that's indexed first.
Also, it's possible to re-write this query to only have one group by, which will probably speed things up.
Try this out:
with UserBids as (
select
a.ID
, b.BidderID
, ActiveBids = count(case when b.Burned = 0 then 1 end)
, BurnedBids = count(case when b.Burned = 0 then 1 end)
from Bids b
join Auctions a
on a.ID = b.AuctionID
where a.Closed = 0
group by b.BidderID, a.AuctionID
)
select
b.ID
, u.Balance
, u.FreeBids
, b.ActiveBids
, b.BurnedBids
from Users u
left join UserBids b
on b.BidderID = u.ID
where u.ID = #UserID;
If you're not familiar with the with UserBids as..., it's called a CTE (common table expression), and is basically a way to make a one-time use view, and a nice way to structure your queries.