SQL Subquery w/LEFT JOIN causing Invalid Object Error - sql

I have a query with the following structure:
EDIT Original structure of the query wasn't quite representative.
SELECT A
,B
,C
,D
FROM ( SELECT id,A
FROM myTable
WHERE conditions
GROUP BY id,A) MainQuery
LEFT JOIN (SELECT id, B, C
FROM myView
WHERE id IN
(
SELECT DISTINCT id
FROM MainQuery
)
) sub1
ON sub1.B = MainQuery.A
LEFT JOIN (SELECT MainQuery.id, D
FROM myOtherView
WHERE sub1.id IN
(
SELECT DISTINCT id
FROM MainQuery
)
) sub2
ON sub2.D = sub1.C
When I run the query, I get the error message Invalid object name 'MainQuery'. When I comment out the LEFT JOINs and the fields they feed in the SELECT statement, the query runs just fine. I've also tried AS MainQuery, but I get the same result.
I suspect it has something to do with scope. Where I'm trying to SELECT DISTINCT id FROM MainQuery, is MainQuery out of scope for the WHERE subquery within sub1?
For context, I've been tasked with rewriting a query that used temp tables into a query that can be used in a report deployed on SSRS 2000. My MainQuery, sub1, and sub2 were temp tables in the original query. Those temp tables used subqueries within them, which I've preserved in my translation. But the original query had the advantage of creating each temp table separately, and then joining the results. Temp tables and subqueries are new to me, so I'm not sure how to adapt between the two, or if that's even the right approach.

The SQL for your MainQuery is invalid. Run it by itself and see:
SELECT A, id
FROM myTable
WHERE conditions
GROUP BY A
You can't select A and id, but only group by A. Either you need to also group by id, or wrap id in an aggregate function like min, or max.
With that addressed it looks like your other issue is that you say "LEFT JOIN" but then place the column of your LEFT JOINED table on the left hand side of your where clause. See below where I flip sub1.B and MainQuery.A in the JOIN.
SELECT A
,B
,C
,D
FROM ( SELECT A, id
FROM myTable
WHERE conditions
GROUP BY A,id) MainQuery
LEFT JOIN nutherTable sub1
on MainQuery.A = sub1.B
and MainQuery.id = sub1.id
LEFT JOIN (SELECT D ...) sub2
ON sub1.C = sub2.D

Related

How to create a select clause using a subquery

I have the following sql statement:
WITH
subquery AS (
select distinct id from a_table where some_field in (1,2,)
)
select id from another_table where id in subquery;
Edit
JOIN is not an option (this is just a reduced example of a bigger query)
But that obviously does not work. The id field exists in both tables (with a different name, but values are the same: numeric ids). Basically what I want to do is filter by the result of the subquery, like a kind of intersection.
Any idea how to write that query in a correct way?
You need a subquery for the second operand of IN that SELECTs from the CTE.
... IN (SELECT id FROM subquery) ...
But I would recommend to rewrite it as a JOIN.
Are you able to join on ID and then filter on the Where clause?
select a.id
from a.table
inner join b.table on a.id = b.id
where b.column in (1,2)
Since you only want the id from another_table you can use exists
with s as (
select id
from a_table
where some_field in (1,2)
)
select id
from another_table t
where exists ( select * from s where s.id=t.id )
But the CTE is really redundant since all you are doing is
select id
from another_table t
where exists (
select * from a_table a where a.id=t.id and a.some_field in (1,2)
)

Can I select several tables in the same WITH query?

I have a long query with a with structure. At the end of it, I'd like to output two tables. Is this possible?
(The tables and queries are in snowflake SQL by the way.)
The code looks like this:
with table_a as (
select id,
product_a
from x.x ),
table_b as (
select id,
product_b
from x.y ),
table_c as (
..... many more alias tables and subqueries here .....
)
select * from table_g where z = 3 ;
But for the very last row, I'd like to query table_g twice, once with z = 3 and once with another condition, so I get two tables as the result. Is there a way of doing that (ending with two queries rather than just one) or do I have to re-run the whole code for each table I want as output?
One query = One result set. That's just the way that RDBMS's work.
A CTE (WITH statement) is just syntactic sugar for a subquery.
For instance, a query similar to yours:
with table_a as (
select id,
product_a
from x.x ),
table_b as (
select id,
product_b
from x.y ),
table_c as (
select id,
product_c
from x.z ),
select *
from table_a
inner join table_b on table_a.id = table_b.id
inner join table_c on table_b.id = table_c.id;
Is 100% identical to:
select *
from
(select id, product_a from x.x) table_a
inner join (select id, product_b from x.y) table_b
on table_a.id = table_b.id
inner join (select id, product_c from x.z) table_c
on table_b.id = table_c.id
The CTE version doesn't give you any extra features that aren't available in the non-cte version (with the exception of a recursive cte) and the execution path will be 100% the same (EDIT: Please see Simon's answer and comment below where he notes that Snowflake may materialize the derived table defined by the CTE so that it only has to perform that step once should the CTE be referenced multiple times in the main query). As such there is still no way to get a second result set from the single query.
While they are the same syntactically, they don't have the same performance plan.
The first case can be when one of the stages in the CTE is expensive, and is reused via other CTE's or join to many times, under Snowflake, use them as a CTE I have witness it running the "expensive" part only a single time, which can be good so for example like this.
WITH expensive_select AS (
SELECT a.a, b.b, c.c
FROM table_a AS a
JOIN table_b AS b
JOIN table_c AS c
WHERE complex_filters
), do_some_thing_with_results AS (
SELECT stuff
FROM expensive_select
WHERE filters_1
), do_some_agregation AS (
SELECT a, SUM(b) as sum_b
FROM expensive_select
WHERE filters_2
)
SELECT a.a
,a.b
,b.stuff
,c.sum_b
FROM expensive_select AS a
LEFT JOIN do_some_thing_with_results AS b ON a.a = b.a
LEFT JOIN do_some_agregation AS c ON a.a = b.a;
This was originally unrolled, and the expensive part was some VIEWS that the date range filter that was applied at the top level were not getting pushed down (due to window functions) so resulted in full table scans, multiple times. Where pushing them into the CTE the cost was paid once. (In our case putting date range filters in the CTE made Snowflake notice the filters and push them down into the view, and things can change, a few weeks later the original code ran as good as the modified, so they "fixed" something)
In other cases, like this the different paths that used the CTE use smaller sub-sets of the results, so using the CTE reduced the remote IO so improved performance, there then was more stalls in the execution plan.
I also use CTEs like this to make the code easier to read, but giving the CTE a meaningful name, but the aliasing it to something short, for use. Really love that.

PostgreSQL: Error in left join

I am trying to join my master table to some sub-tables in PostgreSQL in a single select query. I am getting a syntax error and I have the feeling I am making a terrible mistake or doing something which is not allowed. The code:
Select
id,
length,
other_stuff
from my_table tbl1
Left join
(
Select
id,
height
from my_table2 tbl2) tbl2 using (id)
left join
-- I get syntax error here
(
With a as (select id from some_table),
b as (Select value from other_table)
Select id, value from a, b) tbl3 using (id)
order by tbl1.id
Can we use WITH clause in left joins sub or nested queries and Is there a better way to do this?
UPDATE1
Well, I would like to add some more details. I have three select queries like this (having unique ID) and I want to join them based on ID.
Query1:
With a as (Select id, my_other records... from postgres_table1)
b as (select id, my_records... from postgres_table2)
c as (select id, my_record.. from postgres_table3, b)
Select
id,
my_records
from a left join c on some_condtion_with_a
order by 1
Second query:
Select
id, my_records
from
(
multiple_sub_queries_by_getting_records_from_c
)
Third Query:
With d as (select id, records.. from b),
e as (select id, records.. from d),
f as (select id, records.. from e)
select
id,
records..
from f
I tried to join them using left join. The first two queries were joined successfully. While, joining third query I got the syntax error. Maybe, I am complicating things thus I asked is there a better way to do it.
You are over complicating things. There is no need to use a derived table to outer join my_table2. And there is no need for a CTE plus a derived table to join the tbl3 alias:
Select id,
length,
other_stuff
from my_table tbl1
Left join my_table2 tbl2 using (id)
left join (
select st.id, ot.value
from some_table st
cross join other_table ot
) tbl3 using (id)
order by tbl1.id;
This assumes that the cross join you create with Select id, value from a, b is intended.
Not tested, but I think you need this. try:
with a as (select id from some_table),
b as (Select value from other_table)
Select
id,
length,
other_stuff
from my_table tbl1
Left join
(
Select
id,
height
from my_table2 tbl2
)
tbl2 using (id)
left join
(
Select id, value from a, b
)
tbl3 using (id)
order by tbl1.id
I've only ever seen/used WITH in the following format:
WITH
temptablename(columns) as (query),
temptablename2(columns) as (query),
...
temptablenameX(columns) as (query)
SELECT ...
i.e. they come first
You'll probably find it easier to write queries if you use indentation to describe nesting levels. I like to make my SELECT FROM WHERE GROUPBY ORDERBY at one indent level, and then tablename INNER JOIN ON etc more indented:
SELECT
column
FROM
table
INNER JOIN
(
SELECT subcolumn FROM subtable WHERE subclause
) myalias
ON
table.id = myalias.whatever
WHERE
blah
Organising your indents every time you nest down a layer really helps. By making everything that is "a table or a block of data like a table (i.e. a subquery)" indented the same amount you can easily see the notional order that the DB should retrieve
Move your WITHs to the top of the statement, you will still use the alias names in place in the sub sub query of course
Looking at your query, there isn't much point in your subqueries.. You don't do any grouping or particularly complex processing of the data, you just select an ID and another column and then join it in. Your query will be simpler if you don't do this:
SELECT
column
FROM
table
INNER JOIN
(
SELECT subcolumn FROM subtable WHERE subclause
) myalias
ON
table.id = myalias.whatever
WHERE
blah
Instead, do this:
SELECT
column
FROM
table
INNER JOIN
subtable
ON
table.id = subtable.id
WHERE
blah
Re your updated requirements, following the same pattern.
look for --my comments
With a as (Select id, my_other records... from postgres_table1)
b as (select id, my_records... from postgres_table2)
c as (select id, my_record.. from postgres_table3, b)
d as (select id, records.. from b),
e as (select id, records.. from d),
f as (select id, records.. from e)
SELECT * FROM
(
--your first
Select
id,
my_records
from a left join c on some_condtion_with_a
) Q1
LEFT OUTER JOIN
(
--your second
Select
id, my_records
from
(
multiple_sub_queries_by_getting_records_from_c
)
) Q2
ON Q1.XXXX = Q2.XXXX --fill this in !!!!!!!!!!!!!!!!!!!
LEFT OUTER JOIN
(
--your third
select
id,
records..
from f
) Q3
ON QX.XXXXX = Q3.XXXX --fill this in !!!!!!!!!!!!!!!!!!!
It'll work, but it might not be the prettiest or most necessary SQL arrangement. As both i and HWNN have said, you can rewrite a lot of these queries where you're just doing some simple selecting in your WITH.. But likely that theyre simple enough that the database optimizer can also see this and rerwite the query for you when it runs it
Just remember to code clearly, and lay your indentation out nicely to stop it tunring into a massive, unmaintainable, undebuggable spaghetti mess

SQL with SubSelect refering to same Table as Inital load is slow

i DISTINCT load multiple Tables together:
Simple DISTINCT SELECT Table1.Val1, Table2.Val2, Table1.Val3,...
FROM Table1/2/3
WHERE Table2.KEYVAL = TABLE1.KEYVAL
AND TABLE3.KEYVAL = TABLE2.KEYVAL
some WHERE's on Table 1,2,3
I also have some Subselects with a 4th Table in my Main Select Statement which work fast and fine.
Now i want to calculate the SUM of a Value in Table3 (let's call it weight).
Table3 Result has Multiple Rows (Sub-ID's) but share the Same KEY.
How can i accomplish that?
if i put a SubSelect in my Main Select:
(SELECT Sum(weight)
FROM Table3
WHERE Table3.KEYVAL = Table1.KEYVAL
) as Wheight
it get's ugly slow.
How can i (or does sql) differ between (Main)Table3.weight and (SubSelect)Table3.weight when i want to use a LeftJoin for that Weight?
Sum(weight) in Main Select does not work directly, i think because of not being DISTINCT?
I think you have to use the Group by function in your subquery.
For instance:
Select a.*, b.*
from mytable as a
join (select keyval, sum(weight) from my_second_table group by keyval) as b
on a.keyval = b.keyval

entry cannot be referenced in this part of the query (subquery) Error

I'm getting the following error on my query:
here is an entry for table "table1", but it cannot be referenced from this part of the query.
This is my query:
SELECT id
FROM property_import_image_results table1
LEFT JOIN (
SELECT created_at
FROM property_import_image_results
WHERE external_url = table1.external_url
ORDER BY created_at DESC NULLS LAST
LIMIT 1
) as table2 ON (pimr.created_at = table2.created_at)
WHERE table2.created_at is NULL
You need a lateral join to be able to reference the outer table in the sub-select for the join.
You are also referencing an alias pimr in the join condition, which isn't available anywhere in the query. So you need to change that to table1 in the join condition.
You should also given the table in the inner query an alias to avoid confusion:
SELECT id
FROM property_import_image_results table1
LEFT JOIN LATERAL (
SELECT p2.created_at
FROM property_import_image_results p2
WHERE p2.external_url = table1.external_url
ORDER BY p2.created_at DESC NULLS LAST
LIMIT 1
) as table2 ON (table1.created_at = table2.created_at)
WHERE table2.created_at is NULL
Edit
This kind of query can also be solved using window functions:
select id
from (
select id,
max(created_at) over (partition by external_url) as max_created
FROM property_import_image_results
) t
where created_at <> max_created;
This might be faster than aggregating and joining as you do. But it's hard to tell. The lateral joins are quite efficient as well. It has the advantage that you can add any column you like to the result because no grouping is required.