order by count when you have to count only certain items - sql

I have this query:
SELECT tips.*
FROM `tips` `t`
LEFT JOIN tip_usage
ON tip_usage.tip_id = t.id
GROUP BY t.id
ORDER BY COUNT(CASE
WHEN status = 'Active' THEN status
ELSE NULL
END) DESC
As you see here, I use a left join and I count only the records which are Active
Can I make this query to be different, and to lose the case stuff from the count?

If you want to return all tips, regardless of status, and but sort by number of Active records, then this is as pretty as you are going to get.
If you only want to return active tips, then you can add Where status = 'Active' and then just order by Count(t.id) desc.
One alternative is that you have a NumActive int column in the tips table, and you keep this update whenever a new tip_usage record is added or modified for a given tip. This puts more overhead into the insert/delete/update operations for tip_usage, but would make this query much simpler:
select *
from tips
Order by tips.NumActive desc
Another alternative is:
Select tips.*
From tips
Order by (
Select count(tip_id)
From tips_usage as t
Where t.tip_id = tips.id and status = 'Active') DESC
Though this exchanges a case for a subquery, so just complex in a different way.

Quick note, you cannot select t.* and group on t.id. So with that being said:
SELECT t.id,coalesce(tu.cntUsed,0) as cntUsed
FROM `tips` `t`
LEFT
JOIN (Select tip_id,count(*) as cntUsed
from tip_usage
WHERE status='Active'
group by tip_id
) tu
ON t.id = tu.tip_id
ORDER coalesce(tu.cntUsed,0)
Since you want to left-join and include the tips that have no usage, this at least sorts them all at the top with a value of zero, which is the most accurate statement of the reality of what is in the tables.

SELECT tips.*, COUNT(*) AS number
FROM tip_usage
LEFT JOIN tips ON tips.id = tip_id
WHERE STATUS = "Active"
GROUP BY tip_id
ORDER BY number DESC

Related

How to use DISTINCT ON but ORDER BY another expression?

The model Subscription has_many SubscriptionCart.
A SubscriptionCart has a status and an authorized_at date.
I need to pick the cart with the oldest authorized_at date from all the carts associated to a Subscription, and then I have to order all the returned Subscription results by this subscription_carts.authorized_at column.
The query below is working but I can't figure out how to select DISTINCT ON subscription.id to avoid duplicates but ORDER BY subscription_carts.authorized_at .
raw sql query so far:
select distinct on (s.id) s.id as subscription_id, subscription_carts.authorized_at, s.*
from subscriptions s
join subscription_carts subscription_carts on subscription_carts.subscription_id = s.id
and subscription_carts.plan_id = s.plan_id
where subscription_carts.status = 'processed'
and s.status IN ('authorized','in_trial', 'paused')
order by s.id, subscription_carts.authorized_at
If I try to ORDER BY subscription_carts.authorized_at first, I get an error because the DISTINCT ON and ORDER BY expressions must be in the same order.
The solutions I've found seem too complicated for what I need and I've failed to implement them because I don't understand them fully.
Would it be better to GROUP BY subscription_id and then pick from that group instead of using DISTINCT ON? Any help appreciated.
This requirement is necessary to make DISTINCT ON work; to change the final order, you can add an outer query with another ORDER BY clause:
SELECT *
FROM (SELECT DISTINCT ON (s.id)
s.id as subscription_id, subscription_carts.authorized_at, s.*
FROM subscriptions s
JOIN ...
WHERE ...
ORDER BY s.id, subscription_carts.authorized_at
) AS subq
ORDER BY authorized_at;
You don't have to use DISTINCT ON. While it is occasionally useful, I personally find window function based approaches much more clear:
-- Optionally, list all columns explicitly, to remove the rn column again
SELECT *
FROM (
SELECT
s.id AS subscription_id,
c.authorized_at,
s.*,
ROW_NUMBER () OVER (PARTITION BY s.id ORDER BY c.authorized_at) rn
FROM subscriptions s
JOIN subscription_carts c
ON c.subscription_id = s.id
AND c.plan_id = s.plan_id
WHERE c.status = 'processed'
AND s.status IN ('authorized', 'in_trial', 'paused')
) t
WHERE rn = 1
ORDER BY subscription_id, authorized_at

Bigquery SQL code to pull earliest contact

I have a copy of our salesforce data in bigquery, I'm trying to join the contact table together with the account table.
I want to return every account in the dataset but I only want the contact that was created first for each account.
I've gone around and around in circles today googling and trying to cobble a query together but all roads either lead to no accounts, a single account or loads of contacts per account (ignoring the earliest requirement).
Here's the latest query. that produces no results. I think I'm nearly there but still struggling. any help would be most appreciated.
SELECT distinct
c.accountid as Acct_id
,a.id as a_Acct_ID
,c.id as Cont_ID
,a.id AS a_CONT_ID
,c.email
,c.createddate
FROM `sfdcaccounttable` a
INNER JOIN `sfdccontacttable` c
ON c.accountid = a.id
INNER JOIN
(SELECT a2.id, c2.accountid, c2.createddate AS MINCREATEDDATE
FROM `sfdccontacttable` c2
INNER JOIN `sfdcaccounttable` a2 ON a2.id = c2.accountid
GROUP BY 1,2,3
ORDER BY c2.createddate asc LIMIT 1) c3
ON c.id = c3.id
ORDER BY a.id asc
LIMIT 10
The solution shared above is very BigQuery specific: it does have some quirks you need to work around like the memory error you got.
I once answered a similar question here that is more portable and easier to maintain.
Essentially you need to create a smaller table(even better to make it a view) with the ID and it's first transaction. It's similar to what you shared by slightly different as you need to group ONLY in the topmost query.
It looks something like this
select
# contact ids that are first time contacts
b.id as cont_id,
b.accountid
from `sfdccontacttable` as b inner join
( select accountid,
min(createddate) as first_tx_time
FROM `sfdccontacttable`
group by 1) as a on (a.accountid = b.accountid and b.createddate = a.first_tx_time)
group by 1, 2
You need to do it this way because otherwise you can end up with multiple IDs per account (if there are any other dimensions associated with it). This way also it is kinda future proof as you can have multiple dimensions added to the underlying tables without affecting the result and also you can use a where clause in the inner query to define a "valid" contact and so on. You can then save that as a view and simply reference it in any subquery or join operation
Setup a view/subquery for client_first or client_last
as:
SELECT * except(_rank) from (
select rank() over (partition by accountid order by createddate ASC) as _rank,
*
FROM `prj.dataset.sfdccontacttable`
) where _rank=1
basically it uses a Window function to number the rows, and return the first row, using ASC that's first client, using DESC that's last client entry.
You can do that same for accounts as well, then you can join two simple, as exactly 1 record will be for each entity.
UPDATE
You could also try using ARRAY_AGG which has less memory footprint.
#standardSQL
SELECT e.* FROM (
SELECT ARRAY_AGG(
t ORDER BY t.createddate ASC LIMIT 1
)[OFFSET(0)] e
FROM `dataset.sfdccontacttable` t
GROUP BY t.accountid
)

How can you add 2 joins in a subquery?

I am trying to get information from 3 tables in my database. I am trying to get 4 fields. 'kioskid', 'kioskhours', 'videotime', 'sessiontime'. In order to do this, i am trying a join in a subquery. This is what I have so far:
SELECT k.kioskid, k.hours, v.time, s.time
FROM `nsixty_kiosks` as k
LEFT JOIN (SELECT time
FROM `nsixty_videos`
ORDER BY videoid) as v
ON kioskid = k.kioskid LEFT JOIN
(SELECT kioskid, time
FROM `sessions`
ORDER BY pingid desc LIMIT 1) as s ON s.kioskid = k.kioskid
WHERE hours is NOT NULL
When I run this query, it works but it shows every row instead of just showing the last row of each kiosk id. Which is meant to show based on the line 'ORDER BY pingid desc LIMIT 1'.
Any body have some ideas?
Instead of joining to s, you can use a correlated subquery:
SELECT k.kioskid,
k.hours,
v.time,
( SELECT time
FROM sessions
WHERE sessions.kioskid = k.kioskid
ORDER
BY pingid DESC
LIMIT 1
)
FROM nsixty_kiosks AS k
LEFT
JOIN ( SELECT time
FROM `nsixty_videos`
ORDER BY videoid
) AS v
ON kioskid = k.kioskid
WHERE hours IS NOT NULL
;
N.B. I didn't fix your LEFT JOIN (...) AS v, because I don't understand what it's trying to do, but it too is broken; the ON clause doesn't refer to any of its columns, and there's no point in having an ORDER BY in a subquery unless you also have a LIMIT or whatnot in there.
Well, your join on the 'v' subquery doesn't actually reference the 'v' subquery, nor does the 'v' subquery even contain a kioskid field to JOIN on, so that's undoubtedly part of the problem.
To go much further we'd need to see schema and sample data.

Help with Complicated SELECT query

I have this SELECT query:
SELECT Auctions.ID, Users.Balance, Users.FreeBids,
COUNT(CASE WHEN Bids.Burned=0 AND Auctions.Closed=0 THEN 1 END) AS 'ActiveBids',
COUNT(CASE WHEN Bids.Burned=1 AND Auctions.Closed=0 THEN 1 END) AS 'BurnedBids'
FROM (Users INNER JOIN Bids ON Users.ID=Bids.BidderID)
INNER JOIN Auctions
ON Bids.AuctionID=Auctions.ID
WHERE Users.ID=#UserID
GROUP BY Users.Balance, Users.FreeBids, Auctions.ID
My problam is that it returns no rows if the UserID cant be found on the Bids table.
I know it's something that has to do with my
(Users INNER JOIN Bids ON Users.ID=Bids.BidderID)
But i dont know how to make it return even if the user is no on the Bids table.
You're doing an INNER JOIN, which only returns rows if there are results on both sides of the join. To get what you want, change your WHERE clause like this:
Users LEFT JOIN Bids ON Users.ID=Bids.BidderID
You may also have to change your SELECT statement to handle Bids.Burned being NULL.
If you want to return rows even if there's no matching Auction, then you'll have to make some deeper changes to your query.
My problam is that it returns no rows if the UserID cant be found on the Bids table.
Then INNER JOIN Bids/Auctions should probably be left outer joins. The way you've written it, you're filtering users so that only those in bids and auctions appear.
Left join is the simple answer, but if you're worried about performance I'd consider re-writing it a little bit. For one thing, the order of the columns in the group by matters to performance (although it often doesn't change the results). Generally, you want to group by a column that's indexed first.
Also, it's possible to re-write this query to only have one group by, which will probably speed things up.
Try this out:
with UserBids as (
select
a.ID
, b.BidderID
, ActiveBids = count(case when b.Burned = 0 then 1 end)
, BurnedBids = count(case when b.Burned = 0 then 1 end)
from Bids b
join Auctions a
on a.ID = b.AuctionID
where a.Closed = 0
group by b.BidderID, a.AuctionID
)
select
b.ID
, u.Balance
, u.FreeBids
, b.ActiveBids
, b.BurnedBids
from Users u
left join UserBids b
on b.BidderID = u.ID
where u.ID = #UserID;
If you're not familiar with the with UserBids as..., it's called a CTE (common table expression), and is basically a way to make a one-time use view, and a nice way to structure your queries.

SQL WHEREing on a different table's COUNT

So, I want to apply a WHERE condition to a field assigned by a COUNT() AS clause. My query currently looks like this:
SELECT new_tags.tag_id
, new_tags.tag_name
, new_tags.tag_description
, COUNT(DISTINCT new_tags_entries.entry_id) AS entry_count
FROM (new_tags)
JOIN new_tags_entries ON new_tags_entries.tag_id = new_tags.tag_id
WHERE `new_tags`.`tag_name` LIKE '%w'
AND `entry_count` < '1'
GROUP BY new_tags.tag_id ORDER BY tag_name ASC
The bit that's failing is the entry_count in the WHERE clause - it doesn't know what the entry_count column is. My table looks like this:
new_tags {
tag_id INT
tag_name VARCHAR
}
new_tags_entries {
tag_id INT
entry_id INT
}
I want to filter the results by the number of distinct entry_ids in new_tags_entries that pertain to the tag ID.
Make sense?
Thanks in advance.
To filter on aggegated values use the HAVING clause...
SELECT
new_tags.tag_id, new_tags.tag_name,
new_tags.tag_description,
COUNT(DISTINCT new_tags_entries.entry_id) AS entry_count
FROM (new_tags)
JOIN new_tags_entries ON new_tags_entries.tag_id = new_tags.tag_id
WHERE `new_tags`.`tag_name` LIKE '%w'
GROUP BY new_tags.tag_id
HAVING COUNT(DISTINCT new_tags_entries.entry_id) < '1'
ORDER BY tag_name ASC
An inner join will never have a count of less than 1. Perhaps a left join and IS NULL would help. That, or using SUM() instead.
Although APC's answer will be syntactically correct, if the problem you are trying to solve is indeed: "Find me all new_tags that do not have any news_tags_entries", then the query with INNER JOIN and GROUP BY and HAVING will not yield the correct result. In fact, it will always yield the empty set.
As Ignacio Vazques Abrahams pointed out, a LEFT JOIN will work. And you don't even need the GROUP BY / HAVING:
SELECT news_tags.*
FROM news_tags
LEFT JOIN news_tags_entries
ON news_tags.tag_id = news_tags_entries.tag_id
WHERE news_tags_entries.tag_id IS NULL
(Of course, you can still add GROUP BY and HAVING if you are interested to know how many entries there are, and not just want to find news_tags with zero news_tags_entries. But the LEFT JOIN from news_tags to news_tags_entries needs to be there or else you'll lose the news_tags that have no corresponding items in news_tags_items)
Another, more explicit way to solve the "get me all x for which there is no y" is a correlated NOT EXISTS solution:
SELECT news_tags.*
FROM news_tags
WHERE NOT EXISTS (
SELECT NULL
FROM news_tags_entries
WHERE news_tags_entries.tag_id = news_tags.tag_id
)
Although nice and explicit, this solution is typically shunned in MySQL because of the rather bad subquery performance
SELECT
new_tags.tag_id, new_tags.tag_name,
new_tags.tag_description,
COUNT(DISTINCT new_tags_entries.entry_id) AS entry_count
FROM (new_tags)
LEFT JOIN new_tags_entries ON new_tags_entries.tag_id = new_tags.tag_id
WHERE `new_tags`.`tag_name` LIKE '%w'
GROUP BY new_tags.tag_id ORDER BY tag_name ASC
HAVING `entry_count` < '1'