I want to ask how do I select data together with count data.
In this case, I want the user to appear and the number of transactions that he has.
Like this code that I made.
SELECT "transaction"."user_id",
COUNT(transaction.id) trans_count
FROM transaction
inner join "users" on "users"."id" = "transaction"."user_id"
GROUP BY user_id
The code above successfully selects user_id and trans_count, but when I am trying to show users.name
this error message appears.
Error in query: ERROR: column "users.name" must appear in the GROUP BY
clause or be used in an aggregate function LINE 3: "users"."name"
Is it true that I am cannot select other data when I count data or is there a better way ?.
Thank You.
You can include user.name in the group by:
SELECT "transaction"."user_id",
"user"."name",
COUNT(transaction.id) trans_count
FROM transaction
inner join "users" on "users"."id" = "transaction"."user_id"
GROUP BY "transaction"."user_id", "user"."name"
Otherwise, when the DBMS tries to combine (group) multiple rows into a single row, it doesn't know which name value should it pick, which is why it throws the error.
In this case, user_id and user.name have a one-to-one mapping, so you can simply include name in the group by clause.
Otherwise you'd have to tell the DBMS how to select one value from the multiple records that are in each group, eg:
min(user.name) or max(user.name)
SELECT "transaction"."user_id",
min("user"."name") user_name,
COUNT(transaction.id) trans_count
FROM transaction
inner join "users" on "users"."id" = "transaction"."user_id"
GROUP BY "transaction"."user_id"
When every your using GROUP BY
You must be group by all column which is fetching or use Aggregate Functions
Group by (same as #rohitvats)
GROUP BY "transaction"."user_id", "user"."name"
---- OR ----
Aggregate Functions MAX(), MIN()
SELECT "transaction"."user_id",
MAX("user"."name") as name,
COUNT(transaction.id) trans_count
FROM transaction
inner join "users" on "users"."id" = "transaction"."user_id"
GROUP BY "transaction"."user_id"
Your code would work if you aggregated by the users.id:
SELECT u.id, u.user_name, COUNT(*) as trans_count
FROM users u JOIN
transaction t
ON t.id = u.user_id
GROUP BY u.id;
(I removed the double quotes because they clutter the logic and are not necessary to explain what is going on.)
Why? Presumably, users.id is unique (or equivalently the primary key). Postgres supports aggregating by a unique key in a table and also including unaggregated columns in the SELECT. This is an implementation of "functional dependent" aggregations from the SQL standard.
When you use transactions.user_id, Postgres does not recognize the functional dependence (even though you might think that the ON clause would imply it). So, your code doesn't work.
The alternative is to add user_name to the GROUP BY as well. However, your version almost works if you use the column from the correct table.
Related
I got a table Users and a table Tasks. Tasks are ordered by importance and are assigned to a user's task list. Tasks have a status: ready or not ready. Now, I want to list all users with their most important task that is also ready.
The interesting requirement that the tasks for each user first need to be filtered and sorted, and then the most important one should be selected. This is what I came up with:
SELECT Users.name,
(SELECT *
FROM (SELECT Tasks.description
FROM Tasks
WHERE Tasks.taskListCode = Users.taskListCode AND Tasks.isReady
ORDER BY Tasks.importance DESC)
WHERE rownum = 1
) AS nextTask
FROM Users
However, this results in the error
ORA-00904: "Users"."taskListCode": invalid identifier
I think the reason is that oracle does not support correlating subqueries with more than one level of depth. However, I need two levels so that I can do the WHERE rownum = 1.
I also tried it without a correlating subquery:
SELECT Users.name, Task.description
FROM Users
LEFT JOIN Tasks nextTask ON
nextTask.taskListCode = Users.taskListCode AND
nextTask.importance = MAX(
SELECT tasks.importance
FROM tasks
WHERE tasks.isReady
GROUP BY tasks.id
)
This results in the error
ORA-00934: group function is not allowed here
How would I solve the problem?
One work-around for this uses keep:
SELECT u.name,
(SELECT MAX(t.description) KEEP (DENSE_RANK FIRST ORDER BY T.importance DESC)
FROM Tasks t
WHERE t.taskListCode = u.taskListCode AND t.isReady
) as nextTask
FROM Users u;
Please try with analytic function:
with tp as (select t.*, row_number() over (partition by taskListCode order by importance desc) r
from tasks t
where isReady = 1 /*or 'Y' or what is positive value here*/)
select u.name, tp.description
from users u left outer join tp on (u.taskListCode = tp.taskListCode)
where tp.r = 1;
Here is a solution that uses aggregation rather than analytic functions. You may want to run this against the analytic functions solution to see which is faster; in many cases aggregate queries are (slightly) faster, but it depends on your data, on index usage, etc.
This solution is similar to what Gordon tried to do. I don't know why he wrote it using a correlated subquery instead of a straight join (and don't know if it will work - I've never seen the FIRST/LAST function used with correlated subqueries like that).
It may not work exactly right if there may be NULL in the importance column - then you will need to add nulls first after t.importance and before ). Note: the max(t.description) is needed, because there may be ties by "importance" (two tasks with the same, highest importance for a given user). In that case, one task must be chosen. If the ordering by importance is strict (no ties), then the MAX() does nothing as it selects the MAX over a set of exactly one value, but the compiler doesn't know that beforehand so it does need the MAX().
select u.name,
max(t.description) keep (dense_rank last order by t.importance) as descr
from users u left outer join tasks t on u.tasklistcode = t.tasklistcode
where t.isready = 'Y'
group by u.name
I have 2 tables in my database one tblNews and another tblNewsComments
I want to select 10 records from tblNewsComments than have must Comments of news
I used this query but it give an error
SELECT tblNews.id,
tblNews.newsTitle,
tblNews.createdate,
tblNews.viewcount,
COUNT(tblNewsComments.id) AS comcounts
FROM tblNews
INNER JOIN tblNewsComments ON tblNews.id = tblNewsComments.newsID
GROUP BY tblNews.id
Try to replace
GROUP BY tblNews.id
With
GROUP BY tblNews.id,
tblNews.newsTitle,
tblNews.createdate,
tblNews.viewcount
All the expressions in the SELECT list should be in the GROUP BY or inside an aggregate function.
I've always found this to be an annoyance in SQL. There's nothing logically wrong with your query; you're grouping by news item and selecting various attributes of the news item, and then selecting the count of comments linked to the news item. That makes sense.
The error arises because the SQL engine isn't smart enough to realize that all the columns in tblNews are at the same data context, and that grouping by tblNews.id effectively guarantees that there will only be one newsTitle, createdate, and viewcount for each group. It should be able to realize that, I think, and carry out the query. But it doesn't do that; the only column it considers to be unique in the group data context is the exact column that you grouped by, id.
One solution, as Multisync just posted, is to group by ALL the columns you want to include in the select clause. I don't think this is the best solution, however, as you shouldn't have to specify all those columns in the group by clause, and that would force you to keep adding to that list whenever you want to add a new TblNews column to the select clause.
The solution I've always used is to wrap the column in an ineffectual aggregate function in the select clause; I always use max():
select
tblNews.id,
max(tblNews.newsTitle),
max(tblNews.createdate),
max(tblNews.viewcount),
count(tblNewsComments.id) comcounts
from
tblNews
inner join tblNewsComments on tblNews.id=tblNewsComments.newsID
group by
tblNews.id
;
Or with subquery:
SELECT n.id,
n.newsTitle,
n.createdate,
n.viewcount,
(SELECT COUNT(*) FROM tblNewsComments c ON n.id = c.newsID) AS comcounts
FROM tblNews n
you have to select one column and group by another...other column will not work as they are not in the aggregate function.
SELECT tblNews.id, COUNT(tblNewsComments.newsID) AS comcounts
FROM tblNews
INNER JOIN tblNewsComments ON tblNews.id = tblNewsComments.newsID
GROUP BY tblNews.id
Read Here
This is the problematic part of my query:
SELECT
(SELECT id FROM users WHERE name = 'John') as competitor_id,
(SELECT MIN(duration)
FROM
(SELECT duration FROM attempts
WHERE userid=competitor_id ORDER BY created DESC LIMIT 1,1
) x
) as best_time
On execution, it throws this error:
#1054 - Unknown column 'competitor_id' in 'where clause'
It looks like the derived table 'x' can't see the parent's query alias competitor_id. Is there any way how to create some kind of global alias, which will be usable by all derived tables?
I know I can just use the competitor_id query as a subquery directly in a WHERE clause and avoid using alias at all, but my real query is much bigger and I need to use competitor_id in more subqueries and derived tables, so it would be inefficient if I would used the same subquery more times.
you may not need to use derived tables within the select statement, wouldn't the following accomplish the same thing?
SELECT
users.id as competitor_id,
MIN(duration) as best_time
FROM users
inner join attempts on users.id = attempts.user_id
WHERE name = 'John'
group by users.id
There error is caused because a identifier introduced in a select output-clause cannot be referenced from anywhere else in that clause - basically, with SQL, identifiers/columns are pushed out and not down (or across).
But, even if it were possible, it's not good to write a query this way anyway. Use a JOIN between the users and attempts (on user id), then filter based on the name. The SQL query planner will then take the high-level relational algebra and write an efficient plan for it :) Note that there is no need for either a manual ordering or limit here as the aggregate (MIN) over a group handles that.
SELECT u.id, u.name, MIN(a.duration) as duration
FROM users u
-- match up each attempt per user
JOIN attempts a
ON a.userid = u.id
-- only show users with this name
WHERE u.name = 'John'
-- group so we get the min duration *per user*
-- (name is included so it can be in the output clause)
GROUP BY u.id, u.name
Something about your query seems rather strange. The innermost subquery is selecting one row and then you are taking the min(duration). The min is unnecessary, because there is only one row. You can phrase the query as:
SELECT u.id as competitor_id, a.duration as best_time
from users u left outer join
attempts a
on u.id = a.userid
where u.name = 'John'
order by a.created desc
limit 1, 1;
This seems to be what your query is attempting to do. However, this might not be your intention. It is probably giving the most recent time. (If you are using MySQL, then limit 1, 1 is actually taking the second most recent record). To get the smallest duration (presumably the "best"), you would do:
SELECT u.id as competitor_id, min(a.duration) as best_time
from users u left outer join
attempts a
on u.id = a.userid
where u.name = 'John'
Adding a group by u.id would ensure that this returns exactly one row.
Each user HAS MANY photos and HAS MANY comments. I would like to order users by SUM(number_of_photos, number_of_comments)
Can you suggest me the SQL query?
GROUP BY with JOINs works more efficiently than dependent subqueries (in all relational DBs I know):
Select * From Users
Left Join Photos On (Photos.user_id = Users.id)
Left Join Comments On (Comments.user_id = Users.id)
Group By UserId
Order By (Count(Photos.id) + Count(Comments.id))
with some assumptions on the tables (e.g. an id primary key in each of them).
Select * From Users U
Order By (Select Count(*) From Photos
Where userId = U.UserId) +
(Select Count(*) From Comments
Where userId = U.UserId)
EDIT: although every query using subqueries can also be done using Joins, which will be faster ,
is not a simple question,
and is irrelevant unless the system is
experiencing performance problems.
1) Both constructions must be translated by the query optimizer into a query plan which includes some type of correlated join, be it a nested loop join, hash-join, merge join, or whatever. And it's entirely possible, (even likely), that they will both result in the same query plan.
NOTE: This is because the entire SQL Statement is translated into a single query plan. The subqueries do NOT get their own, individual query plans as though they were being executed in isolation.
What query plan and what type of joins are used will depend on the data structure and the data in each specific situation. The only way to tell which is faster is to try both, in controlled environments, and measure the performance... but,
2) Unless the system is experiencing an issue with performance, (unacceptable poor performance). clarity is more important. And for problems like the one described above, (where none of the data attributes in the "other" tables are required in the output of the SQL Statement, a Subquery is much clearer in describing the function and purpose of the SQL that a join with Group Bys would be.
I think that the accepted solutions would be problematic from a performance standpoint, assuming you have many users, photos, and comments. Your query runs two separate select statements for every row in the user table.
What you want to do is synthesize a query using ActiveRecord that looks like this:
SELECT user.*, COUNT(c.id) + COUNT(p.id) AS total_count
FROM users u LEFT JOIN photos p ON u.id = p.user_id
LEFT JOIN comments c ON u.id = c.user_id
GROUP BY user.id
ORDER BY total_count DESC
The join will be much, much more efficient. Using left joins insures that even if a user has no comments or photos they will still be included in the results.
If I were to assume that you had a count of comments and a count of photos (user.number_of_photos, user.number_of_comments; as seen above), it would be simple (not stupid):
Select user_id from user order by number_of_photos DESC, number_of_comments DESC
In Ruby On Rails:
User.find(:all, :order => '((SELECT COUNT(*) FROM photos WHERE user_id=users.id) + (SELECT COUNT(*) FROM classifications WHERE user_id=users.id)) DESC')
I am attempting to get the information from one table (games) and count the entries in another table (tickets) that correspond to each entry in the first. I want each entry in the first table to be returned even if there aren't any entries in the second. My query is as follows:
SELECT g.*, count(*)
FROM games g, tickets t
WHERE (t.game_number = g.game_number
OR NOT EXISTS (SELECT * FROM tickets t2 WHERE t2.game_number=g.game_number))
GROUP BY t.game_number;
What am I doing wrong?
You need to do a left-join:
SELECT g.Game_Number, g.PutColumnsHere, count(t.Game_Number)
FROM games g
LEFT JOIN tickets t ON g.Game_Number = t.Game_Number
GROUP BY g.Game_Number, g.PutColumnsHere
Alternatively, I think this is a little clearer with a correlated subquery:
SELECT g.Game_Number, G.PutColumnsHere,
(SELECT COUNT(*) FROM Tickets T WHERE t.Game_Number = g.Game_Number) Tickets_Count
FROM Games g
Just make sure you check the query plan to confirm that the optimizer interprets this well.
You need to learn more about how to use joins in SQL:
SELECT g.*, count(*)
FROM games g
LEFT OUTER JOIN tickets t
USING (game_number)
GROUP BY g.game_number;
Note that unlike some database brands, MySQL permits you to list many columns in the select-list even if you only GROUP BY their primary key. As long as the columns in your select-list are functionally dependent on the GROUP BY column, the result is unambiguous.
Other brands of database (Microsoft, Firebird, etc.) give you an error if you list any columns in the select-list without including them in GROUP BY or in an aggregate function.
"FROM games g, tickets t" is the problem line. This performs an inner join. Any where clause can't add on to this. I think you want a LEFT OUTER JOIN.