Writing a subquery instead of using a spreadshet - sql

Pretty basic SQL uses, I usually do some basic joins, and then pull data into Sheets to pivot or filter it to get what I want, but know I can do it quicker all in SQL.
For this query, I want to only return data if the c2.id count is greater than 0. I tried writing a subquery in the where clause, but feels like I need to group by task_id for this to be right...can someone help me understand what I should do and why?
select t.inserted_at::date, count (distinct c2.id), t.id, t.conversation_id
from tasks t
left join users u on u.id = t.creator_id
left join "comments" c2 on t.id = c2.task_id
left join conversations c on c.id = t.conversation_id
where u.include_in_metrics = true
and c.type = 'PROJECT_FEED'
group by 1,3,4
order by t.inserted_at::date desc;

Just add this after the group by before order by
having count(distinct c2.id)>0

Related

Sum fields of an Inner join

How I can add two fields that belong to an inner join?
I have this code:
select
SUM(ACT.NumberOfPlants ) AS NumberOfPlants,
SUM(ACT.NumOfJornales) AS NumberOfJornals
FROM dbo.AGRMastPlanPerformance MPR (NOLOCK)
INNER JOIN GENRegion GR ON (GR.intGENRegionKey = MPR.intGENRegionLink )
INNER JOIN AGRDetPlanPerformance DPR (NOLOCK) ON
(DPR.intAGRMastPlanPerformanceLink =
MPR.intAGRMastPlanPerformanceKey)
INNER JOIN vwGENPredios P โ€‹โ€‹(NOLOCK) ON ( DPR.intGENPredioLink =
P.intGENPredioKey )
INNER JOIN AGRSubActivity SA (NOLOCK) ON (SA.intAGRSubActivityKey =
DPR.intAGRSubActivityLink)
LEFT JOIN (SELECT RA.intGENPredioLink, AR.intAGRActividadLink,
AR.intAGRSubActividadLink, SUM(AR.decNoPlantas) AS
intPlantasTrabajads, SUM(AR.decNoPersonas) AS NumOfJornales,
SUM(AR.decNoPlants) AS NumberOfPlants
FROM AGRRecordActivity RA WITH (NOLOCK)
INNER JOIN AGRActividadRealizada AR WITH (NOLOCK) ON
(AR.intAGRRegistroActividadLink = RA.intAGRRegistroActividadKey AND
AR.bitActivo = 1)
INNER JOIN AGRSubActividad SA (NOLOCK) ON (SA.intAGRSubActividadKey
= AR.intAGRSubActividadLink AND SA.bitEnabled = 1)
WHERE RA.bitActive = 1 AND
AR.bitActive = 1 AND
RA.intAGRTractorsCrewsLink IN(2)
GROUP BY RA.intGENPredioLink,
AR.decNoPersons,
AR.decNoPlants,
AR.intAGRAActivityLink,
AR.intAGRSubActividadLink) ACT ON (ACT.intGENPredioLink IN(
DPR.intGENPredioLink) AND
ACT.intAGRAActivityLink IN( DPR.intAGRAActivityLink) AND
ACT.intAGRSubActivityLink IN( DPR.intAGRSubActivityLink))
WHERE
MPR.intAGRMastPlanPerformanceKey IN(4) AND
DPR.intAGRSubActivityLink IN( 1153)
GROUP BY
P.vchRegion,
ACT.NumberOfFloors,
ACT.NumOfJournals
ORDER BY ACT.NumberOfFloors DESC
However, it does not perform the complete sum. It only retrieves all the values โ€‹โ€‹of the columns and adds them 1 by 1, instead of doing the complete sum of the whole column.
For example, the query returns these results:
What I expect is the final sums. In NumberOfPlants the result of the sum would be 163,237 and of NumberJornales would be 61.
How can I do this?
First of all the (nolock) hints are probably not accomplishing the benefit you hope for. It's not an automatic "go faster" option, and if such an option existed you can be sure it would be already enabled. It can help in some situations, but the way it works allows the possibility of reading stale data, and the situations where it's likely to make any improvement are the same situations where risk for stale data is the highest.
That out of the way, with that much code in the question we're better served with a general explanation and solution for you to adapt.
The issue here is GROUP BY. When you use a GROUP BY in SQL, you're telling the database you want to see separate results per group for any aggregate functions like SUM() (and COUNT(), AVG(), MAX(), etc).
So if you have this:
SELECT Sum(ColumnB) As SumB
FROM [Table]
GROUP BY ColumnA
You will get a separate row per ColumnA group, even though it's not in the SELECT list.
If you don't really care about that, you can do one of two things:
Remove the GROUP BY If there are no grouped columns in the SELECT list, the GROUP BY clause is probably not accomplishing anything important.
Nest the query
If option 1 is somehow not possible (say, the original is actually a view) you could do this:
SELECT SUM(SumB)
FROM (
SELECT Sum(ColumnB) As SumB
FROM [Table]
GROUP BY ColumnA
) t
Note in both cases any JOIN is irrelevant to the issue.

SQL: How to Order By And Limit Via a Join

As an example, let's say I have the following query:
select
u.id,
u.name,
(select s.status from user_status s where s.user_id = u.id order by s.created_at desc limit 1) as status
from
user u
where
u.active = true;
The above query works great. It returns the most recent user status for the selected user. However, I want to know how to get the same result using a join on the user_status table, instead of using a sub-query. Is something like this possible?
I'm using PostgreSQL.
Thank you for any help you can give!
select u.id , b.status
from user u join user_status b on u.id= b.user_id
where u.active = true;
order by b.s.created_at desc limit 1
i think this work.
but in your code there is "b" which i did not know what it is.
JOIN syntax does not directly offer order or limit as options; so strictly speaking you cannot achieve what you want directly as a join. I believe the easiest way to resolve your question is to use a joined subquery, like this:
select
u.id
, u.name
, s.status
from user u
left join (
select
user_id
, status
, row_number() over(partition by user_id
order by created_at desc) as rn
from user_status s
) s on u.id = s.userid and s.rn = 1
where u.active = true;
Here the analytic function row_number() combined with the over() clause enables the subsequent join condition and s.rn=1 to take advantage of both ordering and limiting the joined rows via the calculation of the rn value.
nb a correlated subquery within the select clause (as used in the question's query) acts like a left join because it can return NULL. If that effect isn't needed or desired you can change to an inner join.
It is possible to move that subquery into a CTE, but unless there are compelling reasons to do so I prefer using the more traditional form seen above.
An alternative approach (for Postgres 9.3 or later) is to use a lateral join which is quite similar to the original subquery, but as it becomes part of the from clause is likely to be more efficient that using that subquery in the select clause.
select
u.id
, u.name
, s.status
from user u
left join lateral (
select user_status.status
from user_status
where user_status.user_id = u.id
order by user_status.created_at desc
limit 1
) s ON true
where u.active = true;
I ended up doing the following, which is working great for me:
select
u.id,
u.name,
us.status
from
user u
left join (
select
distinct on (user_id)
*
from
user_status
order by
user_id,
created_at desc
) as us on u.id = vs.user_id
where
u.active = true;
This is also more efficient than the query that I had in my question.
You can achieve it by converting the subquery into a with clause and use it in the join

Aggregate query with subquery (SUM)

I have the following query:
SELECT UserId, (
0.099 *
(
CASE WHEN
(SELECT AcceleratedProfitPercentage FROM CustomGroups cg
INNER JOIN UserCustomGroups ucg ON ucg.CustomGroupId = cg.Id
WHERE Packs.UserCustomGroupId = ucg.Id)
IS NOT NULL THEN
((SELECT AcceleratedProfitPercentage FROM CustomGroups cg
INNER JOIN UserCustomGroups ucg ON ucg.CustomGroupId = cg.Id
WHERE Packs.UserCustomGroupId = ucg.Id)*1.0) / (100*1.0)
ELSE 1
END
)
)
As amount
FROM Packs WHERE Id IN (
SELECT ap.Id FROM Packs ap JOIN Users u ON ap.UserId = u.UserId
WHERE ap.MoneyToReturn > ap.MoneyReturned AND
u.Mass LIKE '1%');
which is producing correct output. However I have no idea how to aggregate it properly. I tried to use standard GROUP BY but I get the error (Column 'Packs.UserCustomGroupId' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY claus). Any ideas? Here is the output I currently get:
I want to aggregate it by UserId. Thanks in advance.
The option that involves the least query-rewriting is to drop your existing query into a CTE or temp table, like so:
; with CTE as (MyQueryHere)
Select UserID, sum(amount)
from CTE
Group by UserID
Wow that is one crazy query you've got going on there.
Try this:
SELECT UserId,
0.099 * SUM(t.Amount) AS [Amount SUM]
FROM Packs P
JOIN Users U
ON P.UserID = U.UserID
LEFT OUTER JOIN UserCustomGroups UCG
ON P.UserCustomGroupID = UCG.ID
LEFT OUTER JOIN CustomGroups CG
ON UCG.CustomGroupID = CG.ID
CROSS APPLY
(
SELECT CASE WHEN CG.ID IS NULL
THEN 1
ELSE CG.AcceleratedProfitPercentage / 100
END AS [Amount]
) t
WHERE P.MoneyToReturn > P.MoneyReturned
AND U.Mass LIKE '1%'
GROUP BY UserID
First, multiplying any number by 1 is pretty pointless, yet I see it twice in your original post. I'm not sure what led to that, but it's unnecessary.
Also, using CROSS APPLY will eliminate the need for you to repeat your subquery. Granted, it's slower (since it'll run on every row returned), but I think it makes sense in this case...Using left outer joins instead of CASE - SELECT - IS NULL makes your query much more efficient and much more readable.
Next, it appears that you are attempting to SUM percentages. Not sure what kind of data you're looking to return, but perhaps AVG would be more appropriate? I can't think of any practical reason why you would be looking to do that.
Lastly, APH's answer will most certainly work (assuming your original query works), but given the obfuscation and inefficiency of your query, I would definitely rewrite it.
Please let me know if you have any questions.

Unable to Group on MSAccess SQL multiple search query

please can you help me before I go out of my mind. I've spent a while on this now and resorted to asking you helpful wonderful people. I have a search query:
SELECT Groups.GroupID,
Groups.GroupName,
( SELECT Sum(SiteRates.SiteMonthlySalesValue)
FROM SiteRates
WHERE InvoiceSites.SiteID = SiteRates.SiteID
) AS SumOfSiteRates,
( SELECT Count(InvoiceSites.SiteID)
FROM InvoiceSites
WHERE SiteRates.SiteID = InvoiceSites.SiteID
) AS CountOfSites
FROM (InvoiceSites
INNER JOIN (Groups
INNER JOIN SitesAndGroups
ON Groups.GroupID = SitesAndGroups.GroupID
) ON InvoiceSites.SiteID = SitesAndGroups.SiteID)
INNER JOIN SiteRates
ON InvoiceSites.SiteID = SiteRates.SiteID
GROUP BY Groups.GroupID
With the following table relationship
http://m-ls.co.uk/ExtFiles/SQL-Relationship.jpg
Without the GROUP BY entry I can get a list of the entries I want but it drills the results down by SiteID where instead I want to GROUP BY the GroupID. I know this is possible but lack the expertise to complete this.
Any help would be massively appreciated.
I think all you need to do is add groups.Name to the GROUP BY clause, however I would adopt for a slightly different approach and try to avoid the subqueries if possible. Since you have already joined to all the required tables you can just use normal aggregate functions:
SELECT Groups.GroupID,
Groups.GroupName,
SUM(SiteRates.SiteMonthlySalesValue) AS SumOfSiteRates,
COUNT(InvoiceSites.SiteID) AS CountOfSites
FROM (InvoiceSites
INNER JOIN (Groups
INNER JOIN SitesAndGroups
ON Groups.GroupID = SitesAndGroups.GroupID
) ON InvoiceSites.SiteID = SitesAndGroups.SiteID)
INNER JOIN SiteRates
ON InvoiceSites.SiteID = SiteRates.SiteID
GROUP BY Groups.GroupID, Groups.GroupName;
I think what you are looking for is something like the following:
SELECT Groups.GroupID, Groups.GroupName, SumResults.SiteID, SumResults.SumOfSiteRates, SumResults.CountOfSites
FROM Groups INNER JOIN
(
SELECT SitesAndGroups.SiteID, Sum(SiteRates.SiteMonthlySalesValue) AS SumOfSiteRates, Count(InvoiceSites.SiteID) AS CountOfSites
FROM SitesAndGroups INNER JOIN (InvoiceSites INNER JOIN SiteRates ON InvoiceSites.SiteID = SiteRates.SiteID) ON SitesAndGroups.SiteID = InvoiceSites.SiteID
GROUP BY SitesAndGroups.SiteID
) AS SumResults ON Groups.SiteID = SumResults.SiteID
This query will group your information based on the SiteID like you want. That query is referenced in the from statement linking to the Groups table to pull the group information that you want.

Help with Complicated SELECT query

I have this SELECT query:
SELECT Auctions.ID, Users.Balance, Users.FreeBids,
COUNT(CASE WHEN Bids.Burned=0 AND Auctions.Closed=0 THEN 1 END) AS 'ActiveBids',
COUNT(CASE WHEN Bids.Burned=1 AND Auctions.Closed=0 THEN 1 END) AS 'BurnedBids'
FROM (Users INNER JOIN Bids ON Users.ID=Bids.BidderID)
INNER JOIN Auctions
ON Bids.AuctionID=Auctions.ID
WHERE Users.ID=#UserID
GROUP BY Users.Balance, Users.FreeBids, Auctions.ID
My problam is that it returns no rows if the UserID cant be found on the Bids table.
I know it's something that has to do with my
(Users INNER JOIN Bids ON Users.ID=Bids.BidderID)
But i dont know how to make it return even if the user is no on the Bids table.
You're doing an INNER JOIN, which only returns rows if there are results on both sides of the join. To get what you want, change your WHERE clause like this:
Users LEFT JOIN Bids ON Users.ID=Bids.BidderID
You may also have to change your SELECT statement to handle Bids.Burned being NULL.
If you want to return rows even if there's no matching Auction, then you'll have to make some deeper changes to your query.
My problam is that it returns no rows if the UserID cant be found on the Bids table.
Then INNER JOIN Bids/Auctions should probably be left outer joins. The way you've written it, you're filtering users so that only those in bids and auctions appear.
Left join is the simple answer, but if you're worried about performance I'd consider re-writing it a little bit. For one thing, the order of the columns in the group by matters to performance (although it often doesn't change the results). Generally, you want to group by a column that's indexed first.
Also, it's possible to re-write this query to only have one group by, which will probably speed things up.
Try this out:
with UserBids as (
select
a.ID
, b.BidderID
, ActiveBids = count(case when b.Burned = 0 then 1 end)
, BurnedBids = count(case when b.Burned = 0 then 1 end)
from Bids b
join Auctions a
on a.ID = b.AuctionID
where a.Closed = 0
group by b.BidderID, a.AuctionID
)
select
b.ID
, u.Balance
, u.FreeBids
, b.ActiveBids
, b.BurnedBids
from Users u
left join UserBids b
on b.BidderID = u.ID
where u.ID = #UserID;
If you're not familiar with the with UserBids as..., it's called a CTE (common table expression), and is basically a way to make a one-time use view, and a nice way to structure your queries.