right way to alias count * in a subquery - sql

I have query below as
select t.comment_count, count(*) as frequency
from
(select u.id, count(c.user_id) as comment_count
from users u
left join comments c
on u.id = c.user_id
and c.created_at between '2020-01-01' and '2020-01-31'
group by 1) t
group by 1
order by 1
when I also try to alias the count(*) as count(t.*) it gives error, can I not alias that with the t from the table? Not sure what I am missing
Thank you

Count(*) stands for the count of all rows returned by a query (with respect to GROUP BY columns). So it makes no sence to specify one of the involved tables. Consider counting rows produced by a join for example. If you need a count of rows of the specific table t you can use count(distinct t.<unique column>)

Related

Get Max from a joined table

I write this script in SQL server And I want get the food name with the Max of order count From this Joined Table . I can get Max value correct but when I add FoodName is select It give me an error.
SELECT S.FoodName, MAX(S.OrderCount) FROM
(SELECT FoodName,
SUM(Number) AS OrderCount
FROM tblFactor
INNER JOIN tblDetail
ON tblFactor.Factor_ID = tblDetail.Factor_ID
WHERE FactorDate = '2020-10-30'
GROUP BY FoodName)S
Here is The Error Message
Column 'S.FoodName' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
also I know I can use order by and top to achieve the food Name and Max order number but I want use the way I use in this script . Thank you for your answers
If I follow you correctly, you can use ORDER BY and TOP (1) directly on the result of the join query:
SELECT TOP (1) f.FoodName, SUM(d.Number) AS OrderCount
FROM tblFactor f
INNER JOIN tblDetail d ON f.Factor_ID = d.Factor_ID
WHERE f.FactorDate = '2020-10-30'
GROUP BY f.FoodName
ORDER BY OrderCount DESC
Notes:
I added table aliases to the query, and prefixed each column with the table it (presumably !) comes from; you might need to review that, as I had to make assumptions
If you want to allow top ties, use TOP (1) WITH TIES instead
You have an aggregation function in the outer query MAX() and an unaggregated column. Hence, the database expects a GROUP BY.
Instead, use ORDER BY and LIMIT:
SELECT FoodName, SUM(Number) AS OrderCount
FROM tblFactor f INNER JOIN
tblDetail d
ON fd.Factor_ID = d.Factor_ID
WHERE FactorDate = '2020-10-30'
GROUP BY FoodName
ORDER BY OrderCount DESC
LIMIT 1;
Note: In a query that references multiple tables, you should qualify all column references. It is not clear where the columns come from, so I cannot do that for this query.

COUNT(*) syntax when Select from Left JoinSQL

I have a LEFT JOIN query
SELECT
a.id_user,b.id_post, COUNT(a.*) as total_users
FROM
posts as b
LEFT JOIN .....
LEFT JOIN .....
WHERE ....
ORDER BY .....
GROUP BY
a._id_user
LIMIT 3,10
If I use COUNT(a.*) AS total_users to retrieve the number of users, but I get an error; what would be the correct syntax?
The syntax should look more like:
SELECT a.?, COUNT(b.?) as total_users
FROM a LEFT JOIN
b
ON . . .
GROUP BY a.?
you need a group by
select a.id_user, b.id_post, count(1)
from.....
group by a.id_user, b.id_post
You can either Count([some specific field]) or Count(*), but as Gordon Linoff correctly stated, you cannot typically use Count(alias.*)
The Count() function expects an expression ( a single value ). For each row where the expression is not NULL, that row will be counted. Count(*) is a special case returning the number of rows in the result set.
Following returns all posts and counts number of users somehow joined to these posts (there is no info about this in your question)
SELECT
a.id_user,
b.id_post,
COUNT(DISTINCT a.id_user) as total_users

How can I join 3 tables and calculate the correct sum of fields from 2 tables, without duplicate rows?

I have tables A, B, C. Table A is linked to B, and table A is linked to C. I want to join the 3 tables and find the sum of B.cost and the sum of C.clicks. However, it is not giving me the expected value, and when I select everything without the group by, it is showing duplicate rows. I am expecting the row values from B to roll up into a single sum, and the row values from C to roll up into a single sum.
My query looks like
select A.*, sum(B.cost), sum(C.clicks) from A
join B
left join C
group by A.id
having sum(cost) > 10
I tried to group by B.a_id and C.another_field_in_a also, but that didn't work.
Here is a DB fiddle with all of the data and the full query:
http://sqlfiddle.com/#!9/768745/13
Notice how the sum fields are greater than the sum of the individual tables? I'm expecting the sums to be equal, containing only the rows of the table B and C once. I also tried adding distinct but that didn't help.
I'm using Postgres. (The fiddle is set to MySQL though.) Ultimately I will want to use a having clause to select the rows according to their sums. This query will be for millions of rows.
If I understand the logic correctly, the problem is the Cartesian product caused by the two joins. Your query is a bit hard to follow, but I think the intent is better handled with correlated subqueries:
select k.*,
(select sum(cost)
from ad_group_keyword_network n
where n.event_date >= '2015-12-27' and
n.ad_group_keyword_id = 1210802 and
k.id = n.ad_group_keyword_id
) as cost,
(select sum(clicks)
from keyword_click c
where (c.date is null or c.date >= '2015-12-27') and
k.keyword_id = c.keyword_id
) as clicks
from ad_group_keyword k
where k.status = 2 ;
Here is the corresponding SQL Fiddle.
EDIT:
The subselect should be faster than the group by on the unaggregated data. However, you need the right indexes: ad_group_keyword_network(ad_group_keyword_id, ad_group_keyword_id, event_date, cost) and keyword_click(keyword_id, date, clicks).
I found this (MySQL joining tables group by sum issue) and created a query like this
select *
from A
join (select B.a_id, sum(B.cost) as cost
from B
group by B.a_id) B on A.id = B.a_id
left join (select C.keyword_id, sum(C.clicks) as clicks
from C
group by C.keyword_id) C on A.keyword_id = C.keyword_id
group by A.id
having sum(cost) > 10
I don't know if it's efficient though. I don't know if it's more or less efficient than Gordon's. I ran both queries and this one seemed faster, 27s vs. 2m35s. Here is a fiddle: http://sqlfiddle.com/#!15/c61c74/10
Simply split the aggregate of the second table into a subquery as follows:
http://sqlfiddle.com/#!9/768745/27
select ad_group_keyword.*, SumCost, sum(keyword_click.clicks)
from ad_group_keyword
left join keyword_click on ad_group_keyword.keyword_id = keyword_click.keyword_id
left join (select ad_group_keyword.id, sum(cost) SumCost
from ad_group_keyword join ad_group_keyword_network on ad_group_keyword.id = ad_group_keyword_network.ad_group_keyword_id
where event_date >= '2015-12-27'
group by ad_group_keyword.id
having sum(cost) > 20
) Cost on Cost.id=ad_group_keyword.id
where
(keyword_click.date is null or keyword_click.date >= '2015-12-27')
and status = 2
group by ad_group_keyword.id

Self Join bringing too many records

I have this query to express a set of business rules.
To get the information I need, I tried joining the table on itself but that brings back many more records than are actually in the table. Below is the query I've tried. What am I doing wrong?
SELECT DISTINCT a.rep_id, a.rep_name, count(*) AS 'Single Practitioner'
FROM [SE_Violation_Detection] a inner join [SE_Violation_Detection] b
ON a.rep_id = b.rep_id and a.hcp_cid = b.hcp_cid
group by a.rep_id, a.rep_name
having count(*) >= 2
You can accomplish this with the having clause:
select a, b, count(*) c
from etc
group by a, b
having count(*) >= some number
I figured out a simpler way to get the information I need for one of the queries. The one above is still wrong.
--Rep violation for different HCP more than 5 times
select distinct rep_id,rep_name,count(distinct hcp_cid)
AS 'Multiple Practitioners'
from dbo.SE_Violation_Detection
group by rep_id,rep_name
having count(distinct hcp_cid)>4
order by count(distinct hcp_cid)

Problem With DISTINCT!

Here is my query:
SELECT
DISTINCT `c`.`user_id`,
`c`.`created_at`,
`c`.`body`,
(SELECT COUNT(*) FROM profiles_comments c2 WHERE c2.user_id = c.user_id AND c2.profile_id = 1) AS `comments_count`,
`u`.`username`,
`u`.`avatar_path`
FROM `profiles_comments` AS `c` INNER JOIN `users` AS `u` ON u.id = c.user_id
WHERE (c.profile_id = 1) ORDER BY `u`.`id` DESC;
It works. The problem though is with the DISTINCT word. As I understand it, it should select only one row per c.user_id.
But what I get is even 4-5 rows with the same c.user_id column. Where is the problem?
actually, DISTINCT does not limit itself to 1 column, basically when you say:
SELECT DISTINCT a, b
What you're saying is, "give me the distinct value of a and b combined" .. just like a multi-column UNIQUE index
distinct will ensure that ALL values in your select clause are unique, not just user_id. If you want to limit the results to individual user_ids, you should group by user_id.
Perhaps what you want is:
SELECT
`c`.`user_id`,
`u`.`username`,
`u`.`avatar_path`,
(SELECT COUNT(*) FROM profiles_comments c2 WHERE c2.user_id = c.user_id AND c2.profile_id = 1) AS `comments_count`
FROM `profiles_comments` AS `c` INNER JOIN `users` AS `u` ON u.id = c.user_id
WHERE (c.profile_id = 1)
GROUP BY `c`.`user_id`,
`u`.`username`,
`u`.`avatar_path`
ORDER BY `u`.`id` DESC;
DISTINCT works at a row level, not just a column level
If you want the DISTiNCT of only one column then you will have to aggregate the rest of the columns returned (MIN, MAX, SUM, AVG, etc)
SELECT DISTINCT (Name), Min (ID)
From MyTable
Distinct will try to return only unique rows, it will not return only 1 row per user id in your example.
http://dev.mysql.com/doc/refman/5.0/en/distinct-optimization.html
You misunderstand. The DISTINCT modifier applies to the entire row — it states that no two identical ROWS will be returned in the result set.
Looking at your SQL, what value of the several available do you expect to see returned in the created_at column (for instance)? It would be impossible to predict the results of the query as written.
Also, you're using profile_comments twice in your SELECT. It appears that you're trying to obtain a count of how many times each user has commented. If so, what you want to do is use an AGGREGATE query, grouped on user_id and including only those columns that uniquely identify a user along with a COUNT of the comments:
SELECT user_id, COUNT(*) FROM profile_comments WHERE profile_id = 1 GROUP BY user_id
You can add the join to users to get the user name if you want but, logically, your result set cannot include other columns from profile_comments and still produce only a single row per user_id unless those columns are also aggregated in some way:
SELECT user_id, MIN(created_at) AS Earliest, MAX(created_at) AS Latest, COUNT(*) FROM profile_comments WHERE profile_id = 1 GROUP BY user_id