Selecting ambiguous column from subquery with postgres join inside - sql

I have the following query:
select x.id0
from (
select *
from sessions
inner join clicked_products on sessions.id0 = clicked_products.session_id0
) x;
Since id0 is in both sessions and clicked_products, I get the expected error:
column reference "id0" is ambiguous
However, to fix this problem in the past I simply needed to specify a table. In this situation, I tried:
select sessions.id0
from (
select *
from sessions
inner join clicked_products on sessions.id0 = clicked_products.session_id0
) x;
However, this results in the following error:
missing FROM-clause entry for table "sessions"
How do I return just the id0 column from the above query?
Note: I realize I can trivially solve the problem by getting rid of the subquery all together:
select sessions.id0
from sessions
inner join clicked_products on sessions.id0 = clicked_products.session_id0;
However, I need to do further aggregations and so do need to keep the subquery syntax.

The only way you can do that is by using aliases for the columns returned from the subquery so that the names are no longer ambiguous.
Qualifying the column with the table name does not work, because sessions is not visible at that point (only x is).
True, this way you cannot use SELECT *, but you shouldn't do that anyway. For a reason why, your query is a wonderful example:
Imagine that you have a query like yours that works, and then somebody adds a new column with the same name as a column in the other table. Then your query suddenly and mysteriously breaks.
Avoid SELECT *. It is ok for ad-hoc queries, but not in code.

select x.id from
(select sessions.id0 as id, clicked_products.* from sessions
inner join
clicked_products on
sessions.id0 = clicked_products.session_id0 ) x;
However, you have to specify other columns from the table sessions since you cannot use SELECT *

I assume:
select x.id from (select sessions.id0 id
from sessions
inner join clicked_products
on sessions.id0 = clicked_products.session_id0 ) x;
should work.
Other option is to use Common Table Expression which are more readable and easier to test.
But still need alias or selecting unique column names.
In general selecting everything with * is not a good idea -- reading all columns is waste of IO.

Related

Why do I get a duplicate column name error only when I SELECT FROM (SELECT)

I imagine this is a really basic oversight on my part but I have an SQL query which works fine. But I when I SELECT from that result (SELECT FROM (SELECT))
I get a 'duplicate column' error. There are duplicate column names, for sure, in two tables where I compare them but they do not cause a problem in the initial result. For example:
SELECT _dia_tagsrel.tag_id,_dia_tagsrel.article_id, _dia_tags.tag_id, _dia_tags.tag
FROM _dia_tagsrel
JOIN _dia_tags
ON _dia_tagsrel.tag_id = _dia_tags.tag_id
Works fine but when I try to select from it, I get the error:
SELECT DISTINCT tag FROM
(SELECT _dia_tagsrel.tag_id,_dia_tagsrel.article_id, _dia_tags.tag_id, _dia_tags.tag
FROM _dia_tagsrel
JOIN _dia_tags
ON _dia_tagsrel.tag_id = _dia_tags.tag_id) a
Regardless of the DISTINCT. Ok, I can change the column names to be unique but the question really is - why do i get the error when I SELECT FROM (SELECT) and not in the initial query?
Thanks
Solution:
SELECT DISTINCT tag_id, tag FROM (SELECT _dia_tagsrel.tag_id, _dia_tagsrel.article_id, _dia_tags.tag
FROM _dia_tagsrel
JOIN _dia_tags
ON _dia_tagsrel.tag_id = _dia_tags.tag_id) a
I only needed to SELECT one of the duplicate columns, even though I was comparing the both of them. Provided by answer below.
In you are second query i.e., the sub query, you are selecting tag_id twice. Though it is from two different tables, it works out whey you are selecting the data. But when you select the columns with same name twice, it provides you duplicate error. Below is the way you have selected the column which is incorrect
_dia_tagsrel.tag_id,_dia_tagsrel.article_id, _dia_tags.tag_id, _dia_tags.tag
While using sub queries, merge, in or exists clause, avoid using the same column names multiple times.
Simple join works out no need of having subquery,
SELECT _dia_tagsrel.tag_id,_dia_tagsrel.article_id, _dia_tags.tag_id, _dia_tags.tag
FROM _dia_tagsrel
JOIN _dia_tags
ON _dia_tagsrel.tag_id = _dia_tags.tag_id
Your first query returns four columns:
tag_id
article_id
tag_id
tag
Duplicate column names are allowed in a result set, but are not allowed in a table -- or derived table, view, CTE, or most subqueries (an exception are EXISTS subqueries).
I hope you can see the duplicate. There is no need to select tag_id twice, because the JOIN requires that the values are the same. So just select three columns:
SELECT tr.tag_id, tr.article_id, t.tag
FROM _dia_tagsrel tr JOIN
_dia_tags t
ON tr.tag_id = t.tag_id
Your subquery has two tag_ids, so how database engine decide which one you want to use.
So, either use one (join requires tag_ids to be same) or re-name it :
If _dia_tag has unique tags then you can use EXISTS instead of INNER JOIN:
SELECT t.tag
FROM _dia_tags t
WHERE EXISTS (SELECT 1 FROM _dia_tagsrel tr WHERE tr.tag_id = t.tag_id);

Using ON clause to JOIN tables with same column name

I wanted to ask about the condition of an ON clause while joining tables:
SELECT c_User.ID
FROM c_User
WHERE EXISTS (
SELECT *
FROM c_Group
JOIN c_Member ON (c_Group.Group_Name LIKE 'mcp%')
WHERE
c_Group.Name = c_Member.Parent_Name
AND c_Member.Child_Name = c_User.Lower_User_Name
)
I know that tables c_Member and c_Group have one column with the same name, Directory_ID. What I expected was c_Member and c_Group to join on that column using something like:
c_Group JOIN c_Member ON (c_Group.Directory_ID = c_Member.Directory_ID)
WHERE c_Group.Group_Name like 'mcp%'
How is this condition able to match the rows?
c_Member ON (c_Group.Group_Name LIKE 'mcp%')
Is this is a shorter way of referring to two tables joining on a column with the same name, while applying the LIKE condition?
If so, then can such a style work for a table with multiple column names that are the same?
This is your correlated subquery:
SELECT *
FROM c_Group
JOIN c_Member ON (c_Group.Group_Name LIKE 'mcp%')
WHERE
c_Group.Name = c_Member.Parent_Name
AND c_Member.Child_Name = c_User.Lower_User_Name
This subquery works, but the way it is spelled makes it quite unclear:
The join condition (c_Group.Group_Name LIKE 'mcp%') is not actually not related to the table being joined (c_Member) ; what it actually does is apply filter on table c_Group that makes it filter on (there is no magic such as shorter way of referring to two tables joining on a column with the same name, while applying the LIKE condition). It would make more sense to move it to the WHERE clause (this would still be functionaly equivalent).
On the other hand, the WHERE clause contains conditions that relate to the tables being joined (for example: c_Group.Name = c_Member.Parent_Name). A more sensible option would be to put them in the ON clause of the JOIN.
Other remarks:
when using NOT EXISTS, you usually would prefer SELECT 1 instead of SELECT *, (most RDBMS will optimize this under the hood for you, but this makes the intent clearer).
table aliases can be used to make the query more readable
I would suggest the following syntax for the query (which is basically syntaxically equivalent to the original, but a lot clearer):
SELECT u.ID
FROM c_User u
WHERE EXISTS (
SELECT 1
FROM c_Group g
JOIN c_Member m ON g.Name = m.Parent_Name AND m.Child_Name = u.Lower_User_Name
WHERE g.Group_Name LIKE 'mcp%'
)

ORA-00904 Invalid Identifier for a query involving inner join

I have a very simple query that I am trying to execute:
select *
from submissions
inner join (
select *
from hackers
inner join challenges
on hackers.hacker_id = challenges.hacker_id
) dtable
on submissions.challenge_id = dtable.challenge_id
and submissions.hacker_id = dtable.hacker_id;
Oracle rejects it with:
ORA-00904: "DTABLE"."HACKER_ID": invalid identifier.
I have kept the alias dtable visible by keeping it outside the brackets. Why is Oracle rejecting my query?
SELECT * in your sub-query is a problem.
You have the same column name in both tables that are being joined. This means that you're trying to create an inline-view called dtable where at least two columns have the same name (in this case both tables have a hacker_id column and your use of * is essentially saying "use them both".). You can't do that.
You're going to need SELECT hackers.a, hackers.b, challenges.x, challenges.y, etc, etc in your sub-query. By being explicit in this way you can ensure that no two columns have the same name.
An alternative could be SELECT hackers.*, challenges.a AS c_a, challenges.b AS c_b, etc.
Either way you are being explicit about which fields to pick up, what their positions and names are, etc. The end result being that you can then avoid columns having the same name as other columns.
You do not need subqueries for this. Your query is not actually "simple". The simple form looks more like this:
select . . .
from submissions s join
hackers h
on s.hacker_id = h.hacker_id join
challenges c
on s.challenge_id = c.challenge_id;
Note that I removed the condition between challenge and hackers on hacker_id. That extra join condition doesn't really make sense to me (although it might make sense if you provided sample data).
As others have said: The sub-select selects two different columns hacker_id from two different tables. This confuses Oracle.
But there is no need for a sub-select
select * from
submissions
inner join challenges
on submissions.challenge_id = challenges.challenge_id
inner join hackers
on submissions.hacker_id = hackers.hacker_id;

SQL subquery multiple times error

I am making a subquery but I am getting a strange error
The column 'RealEstateID' was specified multiple times for 'NotSold'.
here is my code
SELECT *
FROM
(SELECT *
FROM RealEstatesInfo AS REI
LEFT JOIN Purchases AS P
ON P.RealEstateID=REI.RealEstateID
WHERE DateBought IS NULL) AS NotSold
INNER JOIN OwnerEstate AS OE
ON OE.RealEstateID=NotSold.RealEstateID
It's on SQL server by the way.
That's because there will be 2 realestiteids in your subquery. You need to change it to explicitly list the columns from both table and only include 1 realestateid. It doesn't matter which as you use it for your join.
If you're very Lazy you can select rei.* and only name the p cols apart from realestateid.
Btw select * is probably never a good idea in sub queries or derived tables or ctes.

SQL Server extra text after table name in SELECT

I found a SQL statement to the effect of:
SELECT * FROM Users x
My question is: what is x? I have never seen this before.
Thanks.
x is an Alias for the table Users.
Using Table Aliases
The readability of a SELECT statement can be improved by giving a
table an alias, also known as a correlation name or range variable. A
table alias can be assigned either with or without the AS keyword:
SELECT * FROM Users x
SELECT * FROM Users AS x
It's an alias. The AS keyword is optional and has been left out, but it is the same as:
SELECT * FROM Users AS x
This means you can (in some implementations of SQL, SQL Server being one of them, must) use x in the rest of the query to refer back to the table Users specified here. For example:
SELECT x.MyColumn
FROM Users x
WHERE x.AnotherColumn = 42
There are three general use cases for aliases:
Readability. For long table names or when the name will be used many times, it can improve readability. For example, imagine the following without the alias:
SELECT x.SomeColumn, x.SomeOtherColumn, x.AThirdColumn
FROM [my crAzy Table Name with spaces in it] x
WHERE x.AnotherColumn = 42
Disambiguation. Often used for self-joins, note the use of the same table twice. You must use an alias to differentiate the two instances of the Users table:
SELECT x.SomeColumn, COUNT(y.SomeColumn)
FROM Users x
INNER JOIN Users y ON x.SomeOtherColumn < y.SomeOtherColumn
GROUP BY x.SomeColumn
Sub-queries in a FROM or JOIN clause (also called derived tables) must have a name. This is done by specifying an alias:
SELECT x.SomeColumn
FROM
(
SELECT SomeColumn
FROM Users
) x
It's just an alias for Users, which can be used in the query.
Imagine :
you want to retrieve datas from 2 tables, with an Id column in both
if you wanna retrieve these Ids, you have to prefix the column name to avoid confusion.
With alias:
select t1.Id, t2.Id
from mytableWithAReallyComplicatedName t1
inner join mySecondtableWithAReallyComplicatedName t2 on t1.Id = t2.Id
Without alias
select mytableWithAReallyComplicatedName.Id, mySecondtableWithAReallyComplicatedName.Id
from mytableWithAReallyComplicatedName
inner join mySecondtableWithAReallyComplicatedName on mytableWithAReallyComplicatedName.Id = mySecondtableWithAReallyComplicatedName .Id
It the table names are long, the query might be fast less practical to read, with the second version.