Why select invalid field in subquery could run in BigQuery? - sql

For the following sql
CREATE or replace TABLE
temp.t1 ( a STRING)
;
insert into temp.t1 values ('val_a');
CREATE or replace TABLE
temp.t2 (b STRING)
;
insert into temp.t2 values ('val_b');
create or replace table `temp.a1` as
select distinct b
from temp.t2
;
select distinct a
from `temp.t1`
where a in (select distinct a from `temp.a1`)
;
Since there is no a in temp.a1 and there should be an error here, However, the output of Bigquery is
Row a
1 val_a
Why the result happened?
On the other side, when run select distinct a from temp.a1; there is one error Unrecognized name: a comes up.

Your query is:
select distinct a
from `temp.t1`
where a in (select distinct a from `temp.a1`);
You think this should be:
select distinct t1.a
from `temp.t1` t1
where t1.a in (select distinct a1.a from `temp.a1` a1);
And hence generate an error. However, the rules of SQL interpret this as:
select distinct t1.a
from `temp.t1` t1
where t1.a in (select distinct t1.a from `temp.a1` a1);
Because the scoping rules say that if a is not found in the subquery then look for it in the outer query.
That is the definition of SQL.
The solution? Always qualify column references. Qualify means to include the table alias in the reference.
Also note that select distinct is meaningless in the subquery for an in, because in does not create duplicates. You should get rid of the distinct in the subquery.

Related

Is it possible to call a user defined function inside SQL select statement

I am troubleshooting an issue with user defined function in SQL SELECT statement.
I am aware of the following syntax to access a UDF as part of SELECT query.
SELECT dbo.udf_function(param1) AS 'Output'
THE PROBLEM
But I have a situation where I need to blend above query as part of another SELECT statement. Something like below. At first place I know this is not possible as SQL gives error seeing below query as having a sub-query which is trying to return multiple columns (i.e. *)
SELECT
T1.Id, T1.Name, T1.Address,
(SELECT * FROM dbo.udf_function(param1)) AS 'Output'
FROM
table_1 T1
This SQL is not working.
Is there any suggestion to handle above scenario?
There are two scenarios :
(1) dbo.udf_function(param1) returns a scalar and produces one value for each record in table :
SELECT
T1.Id
, T1.Name
, T1.Address
, dbo.udf_function(param1) from table
(2) dbo.udf_function(param1) returns a table and produces rows for each record in table:
SELECT
T1.Id
, T1.Name
, T1.Address
, F.udf_field_1
, F.udf_field_2
from table
cross apply dbo.udf_function(param1) F

Aggregate column from CTE cannot be used in WHERE clause in the query in PostgreSQL

My query follows this structure:
WITH CTE AS (
SELECT t1.x, COUNT(t1.y) AS count
FROM table1 t1
GROUP BY t1.x
)
SELECT CTE.x, CTE.count AS newCount, t2.count AS oldCount
FROM table2 t2 JOIN CTE ON t2.x = CTE.x
WHERE t2.count != CTE.count;
I get the following error: [42803] ERROR: aggregate functions are not allowed in WHERE
It looks like the CTE.count is the aggregate that triggers this error. Aren't CTEs supposed to be calculated before the main query? How to rewrite the query to avoid this?
PostgreSQL 13.2 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.3.1 20191121 (Red Hat 8.3.1-5), 64-bit
The t2.count is being interpreted as an aggregate COUNT() function, and your t2 table does not have a column called count.
Make sure that your table does actually have a count column, or make sure to compute it's aggregate count on another CTE before joining, and then comparing the the results. Also avoid using the alias "count", like the following:
WITH CTE AS (
SELECT t1.x, COUNT(t1.y) AS total
FROM table1 t1
GROUP BY t1.x
),
CTE2 AS (
SELECT t2.x, COUNT(t2.y) AS total
FROM table2 t2
GROUP BY t2.x
)
SELECT
CTE1.x,
CTE1.total AS newCount,
CTE2.total AS oldCount
FROM
CTE2
JOIN CTE1 ON CTE2.x = CTE1.x
WHERE
CTE2.total != CTE1.total;
Looks like it is the "t2.count" that causes the issue.
On dbfiddle I can reproduce the issue ONLY when there is no column named "count" in the table2.
In other words, the error occurs only when table 2 defined like that:
create table table2 (x int, y int);
However if I added the "count" column, the error is gone
create table table2 (x int, y int, count int);
I believe when there is no such column, the postgres handles "count" as an aggregate function and throws the error.
So, my solution would be to check if such column is present and to never use preserved keywords as column names

How to create a select clause using a subquery

I have the following sql statement:
WITH
subquery AS (
select distinct id from a_table where some_field in (1,2,)
)
select id from another_table where id in subquery;
Edit
JOIN is not an option (this is just a reduced example of a bigger query)
But that obviously does not work. The id field exists in both tables (with a different name, but values are the same: numeric ids). Basically what I want to do is filter by the result of the subquery, like a kind of intersection.
Any idea how to write that query in a correct way?
You need a subquery for the second operand of IN that SELECTs from the CTE.
... IN (SELECT id FROM subquery) ...
But I would recommend to rewrite it as a JOIN.
Are you able to join on ID and then filter on the Where clause?
select a.id
from a.table
inner join b.table on a.id = b.id
where b.column in (1,2)
Since you only want the id from another_table you can use exists
with s as (
select id
from a_table
where some_field in (1,2)
)
select id
from another_table t
where exists ( select * from s where s.id=t.id )
But the CTE is really redundant since all you are doing is
select id
from another_table t
where exists (
select * from a_table a where a.id=t.id and a.some_field in (1,2)
)

How to prevent SQL Server bug in inner select expression (using 'IN' & 'EXISTS' keywords)

I have two different tables with their different columns as below:
CREATE TABLE T1(C1 INT)
CREATE TABLE T2(C2 INT)
Every programmer knows if we write a query with wrong syntax, query compiler should give us an error. Such as this one:
SELECT C1 FROM T2
--ERROR: Invalid column name 'C1'.
But if we use this wrong query as inner select, unfortunately SQL will execute it:
SELECT *
FROM T1
WHERE C1 IN (SELECT C1 FROM T2)
--returns all rows of T1
And also the following wrong query will execute too and returns all rows of T1
SELECT *
FROM T1
WHERE EXISTS (SELECT C1 FROM T2)
--returns all rows of T1
It gets worse when we use these wrong queries in UPDATE such as:
UPDATE T1
SET C1 = NULL
WHERE C1 IN (SELECT C1 FROM T2)
--updates all rows of T1
Now, I want to prevent this bug. I can force my DB developers to be careful but is there any systematic way to prevent this bug?
Ever heard of Correlated Subquery, you can always refer outer query columns inside the subquery
am sure you must have seen queries like this
SELECT * FROM T1
WHERE EXISTS (SELECT 1 FROM T2 where t1.c1 = t2.c2)
here C1 column from T1 is referred in Where clause, you are referring in Select thats the difference. There is no BUG here
Is there any systematic way to prevent this bug?
Always use two-part names - [Table].[Column].

works fine in one case / (column ambiguously defined)error in another

I have 2 tables with a column named the same. Column is BAN_KEY
when I run this query
with
t1 as
(
select *
from table1
),
t2 as
(
select *
from table2
)
t3 as
(
select *
from t1, t2
where t1.c1 = t2.c2
)
select * from t3
I get error column ambiguously defined, but when I do it this way
with
t1 as
(
select *
from table1
),
t2 as
(
select *
from table2
)
select *
from t1, t2
where t1.c1 = t2.c2
The result looks like this
BAN_KEY | BAN_KEY_1 | other columns
some values...
What's the reason for this?
First, learn to use proper JOIN syntax. Simple rule: Never use commas in the FROM clause. Always use proper, explicit JOINs.
That has nothing to do with your question. The answer is much simpler. For a CTE (or table), Oracle needs to be able to assign column names to the result so they can be access subsequently. It accepts the column names that you provide, assuming that your intention is correct. Duplicate column names are not allowed because the reference would be ambiguous; hence the error.
Why doesn't this happen for a result set? Oracle does not require that the columns in the result set of a query be unique. For convenience, though, it distinguishes between columns with the same name.