SQL - unknown column in derived table - sql

This is the problematic part of my query:
SELECT
(SELECT id FROM users WHERE name = 'John') as competitor_id,
(SELECT MIN(duration)
FROM
(SELECT duration FROM attempts
WHERE userid=competitor_id ORDER BY created DESC LIMIT 1,1
) x
) as best_time
On execution, it throws this error:
#1054 - Unknown column 'competitor_id' in 'where clause'
It looks like the derived table 'x' can't see the parent's query alias competitor_id. Is there any way how to create some kind of global alias, which will be usable by all derived tables?
I know I can just use the competitor_id query as a subquery directly in a WHERE clause and avoid using alias at all, but my real query is much bigger and I need to use competitor_id in more subqueries and derived tables, so it would be inefficient if I would used the same subquery more times.

you may not need to use derived tables within the select statement, wouldn't the following accomplish the same thing?
SELECT
users.id as competitor_id,
MIN(duration) as best_time
FROM users
inner join attempts on users.id = attempts.user_id
WHERE name = 'John'
group by users.id

There error is caused because a identifier introduced in a select output-clause cannot be referenced from anywhere else in that clause - basically, with SQL, identifiers/columns are pushed out and not down (or across).
But, even if it were possible, it's not good to write a query this way anyway. Use a JOIN between the users and attempts (on user id), then filter based on the name. The SQL query planner will then take the high-level relational algebra and write an efficient plan for it :) Note that there is no need for either a manual ordering or limit here as the aggregate (MIN) over a group handles that.
SELECT u.id, u.name, MIN(a.duration) as duration
FROM users u
-- match up each attempt per user
JOIN attempts a
ON a.userid = u.id
-- only show users with this name
WHERE u.name = 'John'
-- group so we get the min duration *per user*
-- (name is included so it can be in the output clause)
GROUP BY u.id, u.name

Something about your query seems rather strange. The innermost subquery is selecting one row and then you are taking the min(duration). The min is unnecessary, because there is only one row. You can phrase the query as:
SELECT u.id as competitor_id, a.duration as best_time
from users u left outer join
attempts a
on u.id = a.userid
where u.name = 'John'
order by a.created desc
limit 1, 1;
This seems to be what your query is attempting to do. However, this might not be your intention. It is probably giving the most recent time. (If you are using MySQL, then limit 1, 1 is actually taking the second most recent record). To get the smallest duration (presumably the "best"), you would do:
SELECT u.id as competitor_id, min(a.duration) as best_time
from users u left outer join
attempts a
on u.id = a.userid
where u.name = 'John'
Adding a group by u.id would ensure that this returns exactly one row.

Related

Select Data with Count in Postgresql

I want to ask how do I select data together with count data.
In this case, I want the user to appear and the number of transactions that he has.
Like this code that I made.
SELECT "transaction"."user_id",
COUNT(transaction.id) trans_count
FROM transaction
inner join "users" on "users"."id" = "transaction"."user_id"
GROUP BY user_id
The code above successfully selects user_id and trans_count, but when I am trying to show users.name
this error message appears.
Error in query: ERROR: column "users.name" must appear in the GROUP BY
clause or be used in an aggregate function LINE 3: "users"."name"
Is it true that I am cannot select other data when I count data or is there a better way ?.
Thank You.
You can include user.name in the group by:
SELECT "transaction"."user_id",
"user"."name",
COUNT(transaction.id) trans_count
FROM transaction
inner join "users" on "users"."id" = "transaction"."user_id"
GROUP BY "transaction"."user_id", "user"."name"
Otherwise, when the DBMS tries to combine (group) multiple rows into a single row, it doesn't know which name value should it pick, which is why it throws the error.
In this case, user_id and user.name have a one-to-one mapping, so you can simply include name in the group by clause.
Otherwise you'd have to tell the DBMS how to select one value from the multiple records that are in each group, eg:
min(user.name) or max(user.name)
SELECT "transaction"."user_id",
min("user"."name") user_name,
COUNT(transaction.id) trans_count
FROM transaction
inner join "users" on "users"."id" = "transaction"."user_id"
GROUP BY "transaction"."user_id"
When every your using GROUP BY
You must be group by all column which is fetching or use Aggregate Functions
Group by (same as #rohitvats)
GROUP BY "transaction"."user_id", "user"."name"
---- OR ----
Aggregate Functions MAX(), MIN()
SELECT "transaction"."user_id",
MAX("user"."name") as name,
COUNT(transaction.id) trans_count
FROM transaction
inner join "users" on "users"."id" = "transaction"."user_id"
GROUP BY "transaction"."user_id"
Your code would work if you aggregated by the users.id:
SELECT u.id, u.user_name, COUNT(*) as trans_count
FROM users u JOIN
transaction t
ON t.id = u.user_id
GROUP BY u.id;
(I removed the double quotes because they clutter the logic and are not necessary to explain what is going on.)
Why? Presumably, users.id is unique (or equivalently the primary key). Postgres supports aggregating by a unique key in a table and also including unaggregated columns in the SELECT. This is an implementation of "functional dependent" aggregations from the SQL standard.
When you use transactions.user_id, Postgres does not recognize the functional dependence (even though you might think that the ON clause would imply it). So, your code doesn't work.
The alternative is to add user_name to the GROUP BY as well. However, your version almost works if you use the column from the correct table.

Subquery that accesses main table fields combined with LIMIT clause in Oracle SQL

I got a table Users and a table Tasks. Tasks are ordered by importance and are assigned to a user's task list. Tasks have a status: ready or not ready. Now, I want to list all users with their most important task that is also ready.
The interesting requirement that the tasks for each user first need to be filtered and sorted, and then the most important one should be selected. This is what I came up with:
SELECT Users.name,
(SELECT *
FROM (SELECT Tasks.description
FROM Tasks
WHERE Tasks.taskListCode = Users.taskListCode AND Tasks.isReady
ORDER BY Tasks.importance DESC)
WHERE rownum = 1
) AS nextTask
FROM Users
However, this results in the error
ORA-00904: "Users"."taskListCode": invalid identifier
I think the reason is that oracle does not support correlating subqueries with more than one level of depth. However, I need two levels so that I can do the WHERE rownum = 1.
I also tried it without a correlating subquery:
SELECT Users.name, Task.description
FROM Users
LEFT JOIN Tasks nextTask ON
nextTask.taskListCode = Users.taskListCode AND
nextTask.importance = MAX(
SELECT tasks.importance
FROM tasks
WHERE tasks.isReady
GROUP BY tasks.id
)
This results in the error
ORA-00934: group function is not allowed here
How would I solve the problem?
One work-around for this uses keep:
SELECT u.name,
(SELECT MAX(t.description) KEEP (DENSE_RANK FIRST ORDER BY T.importance DESC)
FROM Tasks t
WHERE t.taskListCode = u.taskListCode AND t.isReady
) as nextTask
FROM Users u;
Please try with analytic function:
with tp as (select t.*, row_number() over (partition by taskListCode order by importance desc) r
from tasks t
where isReady = 1 /*or 'Y' or what is positive value here*/)
select u.name, tp.description
from users u left outer join tp on (u.taskListCode = tp.taskListCode)
where tp.r = 1;
Here is a solution that uses aggregation rather than analytic functions. You may want to run this against the analytic functions solution to see which is faster; in many cases aggregate queries are (slightly) faster, but it depends on your data, on index usage, etc.
This solution is similar to what Gordon tried to do. I don't know why he wrote it using a correlated subquery instead of a straight join (and don't know if it will work - I've never seen the FIRST/LAST function used with correlated subqueries like that).
It may not work exactly right if there may be NULL in the importance column - then you will need to add nulls first after t.importance and before ). Note: the max(t.description) is needed, because there may be ties by "importance" (two tasks with the same, highest importance for a given user). In that case, one task must be chosen. If the ordering by importance is strict (no ties), then the MAX() does nothing as it selects the MAX over a set of exactly one value, but the compiler doesn't know that beforehand so it does need the MAX().
select u.name,
max(t.description) keep (dense_rank last order by t.importance) as descr
from users u left outer join tasks t on u.tasklistcode = t.tasklistcode
where t.isready = 'Y'
group by u.name

DB2 alias in WHERE clause

I have a couple DB2 tables, one for users and one for newsletters and I want to select using an alias in the WHERE clause.
SELECT a.*, b.tech_id as user FROM users a
JOIN newsletter b ON b.tech_id = a.newsletter_id
WHERE timestamp(user) < current_timestamp
This is radically simplified so I can see what's going on, but I am getting an error that makes me think that the user alias isn't getting passed correctly:
ERROR: An invalid datetime format was detected; that is, an
invalid string representation or value was specified.
The user.tech_id is a string built from the datetime when the record was created, so it looks something like 20150210175040951186000000. I've verified that I can execute a timestamp(tech_id) successfully-- so it can't be the format of the field causing the problem.
Any ideas?
More information:
There's multiple newsletters per user. I need to get the most recent newsletter (by the tech_id) and check if that was created in the past week. So the more complex version would be something like:
SELECT a.*, b.tech_id as user FROM users a
JOIN newsletter b ON b.tech_id = a.newsletter_id
WHERE timestamp(max(user)) < current_timestamp
Is there a way to JOIN only on the most recent record?
The order of execution is different to the order of writing. The FROM & WHERE clauses are executed before the SELECT clause hence the alias does not exist when you are trying to use it.
You would have to "nest" part of the query so that the alias is defined before the where clause. Can be easier in many cases to not use the alias.
try
WHERE timestamp(b.tech_id) < current_timestamp
The generic "order of execution" of SQL clauses is
FROM
JOINs (as part of the from clause)
WHERE
GROUP BY
HAVING
SELECT
ORDER BY
Is there a way to JOIN only on the most recent record?
A useful technique for this is using ROW_NUMBER() assuming your DB2 supports it, and would look something like this:
SELECT
a.*
, b.tech_id AS techuser
FROM users a
JOIN (
SELECT
*
, ROW_NUMBER() OVER (ORDER BY timestamp(tech_id) DESC) AS RN
FROM newsletter
) b
ON b.tech_id = a.newsletter_id
AND b.rn = 1
this would give you just one row from newsletter, and using the DESCending order gives you the "most recent" assuming timestamp(tech_id) works as described.
To get most recent newsletter of user, consider ordering the join query, then select top record (in DB2 you would use FETCH FIRST ONLY):
SELECT a.*, b.tech_id as user
FROM users a
INNER JOIN newsletter b ON b.tech_id = a.newsletter_id
ORDER BY b.tech_id
FETCH FIRST 1 ROW ONLY;
Alternatively, you can use a subquery in WHERE clause that aggregates the max user:
SELECT a.*, b.tech_id as user
FROM users a
WHERE b.tech_id IN (
SELECT Max(n.tech_id) as maxUser
FROM users u
INNER JOIN newsletter n ON n.tech_id = u.newsletter_id)
I left out the condition of timestamp(user) < current_timestamp as data stored in database will always be less than current time (i.e., now).

Select Users with more Items

Each user HAS MANY photos and HAS MANY comments. I would like to order users by SUM(number_of_photos, number_of_comments)
Can you suggest me the SQL query?
GROUP BY with JOINs works more efficiently than dependent subqueries (in all relational DBs I know):
Select * From Users
Left Join Photos On (Photos.user_id = Users.id)
Left Join Comments On (Comments.user_id = Users.id)
Group By UserId
Order By (Count(Photos.id) + Count(Comments.id))
with some assumptions on the tables (e.g. an id primary key in each of them).
Select * From Users U
Order By (Select Count(*) From Photos
Where userId = U.UserId) +
(Select Count(*) From Comments
Where userId = U.UserId)
EDIT: although every query using subqueries can also be done using Joins, which will be faster ,
is not a simple question,
and is irrelevant unless the system is
experiencing performance problems.
1) Both constructions must be translated by the query optimizer into a query plan which includes some type of correlated join, be it a nested loop join, hash-join, merge join, or whatever. And it's entirely possible, (even likely), that they will both result in the same query plan.
NOTE: This is because the entire SQL Statement is translated into a single query plan. The subqueries do NOT get their own, individual query plans as though they were being executed in isolation.
What query plan and what type of joins are used will depend on the data structure and the data in each specific situation. The only way to tell which is faster is to try both, in controlled environments, and measure the performance... but,
2) Unless the system is experiencing an issue with performance, (unacceptable poor performance). clarity is more important. And for problems like the one described above, (where none of the data attributes in the "other" tables are required in the output of the SQL Statement, a Subquery is much clearer in describing the function and purpose of the SQL that a join with Group Bys would be.
I think that the accepted solutions would be problematic from a performance standpoint, assuming you have many users, photos, and comments. Your query runs two separate select statements for every row in the user table.
What you want to do is synthesize a query using ActiveRecord that looks like this:
SELECT user.*, COUNT(c.id) + COUNT(p.id) AS total_count
FROM users u LEFT JOIN photos p ON u.id = p.user_id
LEFT JOIN comments c ON u.id = c.user_id
GROUP BY user.id
ORDER BY total_count DESC
The join will be much, much more efficient. Using left joins insures that even if a user has no comments or photos they will still be included in the results.
If I were to assume that you had a count of comments and a count of photos (user.number_of_photos, user.number_of_comments; as seen above), it would be simple (not stupid):
Select user_id from user order by number_of_photos DESC, number_of_comments DESC
In Ruby On Rails:
User.find(:all, :order => '((SELECT COUNT(*) FROM photos WHERE user_id=users.id) + (SELECT COUNT(*) FROM classifications WHERE user_id=users.id)) DESC')

SQL GROUP BY/COUNT even if no results

I am attempting to get the information from one table (games) and count the entries in another table (tickets) that correspond to each entry in the first. I want each entry in the first table to be returned even if there aren't any entries in the second. My query is as follows:
SELECT g.*, count(*)
FROM games g, tickets t
WHERE (t.game_number = g.game_number
OR NOT EXISTS (SELECT * FROM tickets t2 WHERE t2.game_number=g.game_number))
GROUP BY t.game_number;
What am I doing wrong?
You need to do a left-join:
SELECT g.Game_Number, g.PutColumnsHere, count(t.Game_Number)
FROM games g
LEFT JOIN tickets t ON g.Game_Number = t.Game_Number
GROUP BY g.Game_Number, g.PutColumnsHere
Alternatively, I think this is a little clearer with a correlated subquery:
SELECT g.Game_Number, G.PutColumnsHere,
(SELECT COUNT(*) FROM Tickets T WHERE t.Game_Number = g.Game_Number) Tickets_Count
FROM Games g
Just make sure you check the query plan to confirm that the optimizer interprets this well.
You need to learn more about how to use joins in SQL:
SELECT g.*, count(*)
FROM games g
LEFT OUTER JOIN tickets t
USING (game_number)
GROUP BY g.game_number;
Note that unlike some database brands, MySQL permits you to list many columns in the select-list even if you only GROUP BY their primary key. As long as the columns in your select-list are functionally dependent on the GROUP BY column, the result is unambiguous.
Other brands of database (Microsoft, Firebird, etc.) give you an error if you list any columns in the select-list without including them in GROUP BY or in an aggregate function.
"FROM games g, tickets t" is the problem line. This performs an inner join. Any where clause can't add on to this. I think you want a LEFT OUTER JOIN.