Can I use WHERE clause after JOIN USING in snowflake? - sql

Can I use WHERE after
JOIN USING?
In my case if I run on snowflake multiple times the same code:
with CTE1 as
(
select *
from A
left join B
on A.date_a = B.date_b
)
select *
from CTE1
inner join C
using(var1_int)
where CTE1.date_a >= date('2020-10-01')
limit 1000;
sometimes I get a result and sometimes i get the error:
SQL compilation error: Can not convert parameter 'DATE('2020-10-01')' of type [DATE] into expected type [NUMBER(38,0)]
where NUMBER(38,0) is the type of var1_int column

Your problem has nothing to do with the existence of a where clause. Of course you can use a where clause after joins. That is how SQL queries are constructed.
According to the error message, CTE1.date_a is a number. Comparing it to a date results in a type-conversion error. If you provided sample data and desired results, then it might be possible to suggest a way to fix the problem.

tl;dr: Instead of JOIN .. USING() always prefer JOIN .. ON.
You are right to be suspicious of the results. Given your staging, only one of these queries returns without errors:
select a.date_1, id_1
from AE_USING_TST_A a
left join AE_USING_TST_B b
on a.date_1 = b.date_2
join AE_USING_TST_C v
using(id_1)
where A.date_1 >= date('2020-10-01')
-- Can not convert parameter 'DATE('2020-10-01')' of type
-- [DATE] into expected type [NUMBER(38,0)]
;
select a.date_1, a.id_1
from AE_USING_TST_A a
left join AE_USING_TST_B b
on a.date_1 = b.date_2
join AE_USING_TST_C v
on a.id_1=v.id_1
where A.date_1 >= date('2020-10-01')
-- 2020-10-11 2
;
I would call this a bug, except that the documentation is clear about not doing this kind of queries with JOIN .. USING:
To use the USING clause properly, the projection list (the list of columns and other expressions after the SELECT keyword) should be “*”. This allows the server to return the key_column exactly once, which is the standard way to use the USING clause. For examples of standard and non-standard usage, see the examples below.
https://docs.snowflake.com/en/sql-reference/constructs/join.html
The documentation doubles down on the problems of using USING() on non-standard situations, with a different query acting "wrong":
The following example shows non-standard usage; the projection list contains something other than “*”. Because the usage is non-standard, the output contains two columns named “userid”, and the second occurrence (which you might expect to contain a value from table ‘r’) contains a value that is not in the table (the value ‘a’ is not in the table ‘r’).
So just prefer JOIN .. ON. For extra discussion on the SQL ANSI standard not defining behavior for some cases of USING() check:
https://community.snowflake.com/s/question/0D50Z00008WRZBBSA5/bug-with-join-using-

Related

SQLite: distinguish between table and column alias

Can SQLite distinguish between a column from some aliased table, e.g. table1.column and a column that is aliased with the same name, i.e. column, in the SELECT statement?
This is relevant because I need to refer to the column that I construct in the SELECT statement later on in a HAVING clause, but must not confuse it with the column in aliased table. To my knowledge, I cannot alias the table to be constructed in my SELECT statement (without reverting to some nasty work-around like SELECT * FROM (SELECT ...) AS alias) to ensure both are distinguishable.
Here's a stripped down version of the code I am concerned with:
SELECT
a.entity,
b.DATE,
TOTAL(a.dollar_amount*b.ret_usd)/TOTAL(a.dollar_amount) AS ret_usd
FROM holdings a
LEFT JOIN returns b
ON a.stock = b.stock AND
a.DATE = b.DATE
GROUP BY
a.entity,
b.DATE
HAVING
ret_usd NOT NULL
Essentially, I want to get rid of groups for which I cannot find any returns and thus would show up with NULL values. I am not using an INNER JOIN because in my production code I merge multiple types of returns - for some of which I may have no data. I only want to drop those groups for which I have no returns for any of the return types.
To my understanding, the SQLite documentation does not address this issue.
LEFT JOIN all the return tables, then add a WHERE something like
COALESCE(b.ret_used, c.ret_used, d.ret_used....) is not NULL
You might need a similar strategy to determine which ret_used in the TOTAL. FYI, TOTAL never returns NULL.

Oracle Invalid Number in Join Clause

I am getting an Oracle Invalid Number error that doesn't make sense to me. I understand what this error means but it should not be happening in this case. Sorry for the long question, but please bear with me so I can explain this thoroughly.
I have a table which stores IDs to different sources, and some of the IDs can contain letters. Therefore, the column is a VARCHAR.
One of the sources has numeric IDs, and I want to join to that source:
SELECT *
FROM (
SELECT AGGPROJ_ID -- this column is a VARCHAR
FROM AGG_MATCHES -- this is the table storing the matches
WHERE AGGSRC = 'source_a'
) m
JOIN SOURCE_A a ON a.ID = TO_NUMBER(m.AGGPROJ_ID);
In most cases this works, but depending on random things such as what columns are in the select clause, if it uses a left join or an inner join, etc., I will start seeing the Invalid Number error.
I have verified multiple times that all entries in AGG_MATCHES where AGGSRC = 'source_a' do not contain non numeric characters in the AGGPROJ_ID column:
-- this returns no results
SELECT AGGPROJ_ID
FROM AGG_MATCHES
WHERE AGGSRC = 'source_a' AND REGEXP_LIKE(AGGPROJ_ID, '[^0-9]');
I know that Oracle basically rewrites the query internally for optimization. Going back to the first SQL example, my best guess is that depending on how the entire query is written, in some cases Oracle is trying to perform the JOIN before the sub query. In other words, it's trying to join the entire AGG_MATCHES tables to SOURCE_A instead of just the subset returned by the sub query. If so, there would be rows that contain non numeric values in the AGGPROJ_ID column.
Does anyone know for certain if this is what's causing the error? If it is the reason, is there anyway for me to force Oracle to execute the sub query part first so it's only trying to join a subset of the AGG_MATCHES table?
A little more background:
This is obviously a simplified example to illustrate the problem. The AGG_MATCHES table is used to store "matches" between different sources (i.e. projects). In other words, it's used to say that a project in sourceA is matched to a project in sourceB.
Instead of writing the same SQL over and over, I've created views for the sources we commonly use. The idea is to have a view with two columns, one for SourceA and one for SourceB. For this reason, I don't want to use TO_CHAR on the ID column of the source table, because devs would have to remember to do this every time they are doing a join, and I'm trying to remove code duplication. Also, since the ID in SOURCE_A is a number, I feel that any view storing SOURCE_A.ID should go ahead and convert it to a number.
You are right that Oracle is executing the statement in a different order than what you wrote, causing conversion errors.
The best ways to fix this problem, in order, are:
Change the data model to always store data as the correct type. Always store numbers as numbers, dates as dates, and strings as strings. (You already know this and said you can't change your data model, this is a warning for future readers.)
Convert numbers to strings with a TO_CHAR.
If you're on 12.2, convert strings to numbers using the DEFAULT return_value ON CONVERSION ERROR syntax, like this:
SELECT *
FROM (
SELECT AGGPROJ_ID -- this column is a VARCHAR
FROM AGG_MATCHES -- this is the table storing the matches
WHERE AGGSRC = 'source_a'
) m
JOIN SOURCE_A a ON a.ID = TO_NUMBER(m.AGGPROJ_ID default null on conversion error);
Add a ROWNUM to an inline view to prevent optimizer transformations that may re-write statements. ROWNUM is always evaluated at the end and it forces Oracle to run things in a certain order, even if the ROWNUM isn't used. (Officially hints are the way to do this, but getting hints right is too difficult.)
SELECT *
FROM (
SELECT AGGPROJ_ID -- this column is a VARCHAR
FROM AGG_MATCHES -- this is the table storing the matches
WHERE AGGSRC = 'source_a'
--Prevent optimizer transformations for type safety.
AND ROWNUM >= 1
) m
JOIN SOURCE_A a ON a.ID = TO_NUMBER(m.AGGPROJ_ID);
I think the simplest solution uses case, which has more guarantees on the order of evaluation:
SELECT a.*
FROM AGG_MATCHES m JOIN
SOURCE_A a
ON a.ID = (CASE WHEN m.AGGSRC = 'source_a' THEN TO_NUMBER(m.AGGPROJ_ID) END);
Or, better yet, convert to strings:
SELECT a.*
FROM AGG_MATCHES m JOIN
SOURCE_A a
ON TO_CHAR(a.ID) = m.AGGPROJ_ID AND
m.AGGSRC = 'source_a' ;
That said, the best advice is to fix the data model.
Possibly the best solution in your case is simply a view or a generate column:
create view v_agg_matches_a as
select . . .,
(case when regexp_like(AGGPROJ_ID, '^[0-9]+$')
then to_number(AGGPROJ_ID)
end) as AGGPROJ_ID
from agg_matches am
where m.AGGSRC = 'source_a';
The case may not be necessary if you use a view, but it is safer.
Then use the view in subsequent queries.

Alternate solution for the query - Used INTERSECT function in oracle plsql

I am working on the query. I have two tables one is detail table where not grouping happen and its like including all the values and other table is line table which has important column grouped together from detail table.
I want to show all the column from line table and some column from detail table.
I am using below query to fetch my records
SELECT ab.*,
cd.phone_number,
cd.id
FROM xxx_line ab,
xxx_detail cd
WHERE cd.reference_number = ab.reference_number
AND cd.org_id = ab.org_id
AND cd.request_id = ab.request_id
AND ab.request_id = 13414224
INTERSECT
SELECT ab.*,
cd.phone_number,
cd.id
FROM xxx_line ab,
xxx_detail cd
WHERE cd.reference_number = ab.reference_number
AND cd.org_id = ab.org_id
AND cd.request_id = ab.request_id
AND ab.request_id = 13414224
The query is working fine...
But I want to know is there any other way for I can achieve the same result by not even using Intersect.
I purpose is to find out all possible way to get the same output.
The INTERSECT operator returns the unique set of rows returned by each query. The code can be re-written with a DISTINCT operator to make the meaning clearer:
SELECT DISTINCT
xxx_line.*,
xxx_detail.phone_number,
xxx_detail.id
FROM xxx_line
JOIN xxx_detail
ON xxx_line.reference_number = xxx_detail.reference_number
AND xxx_line.org_id = xxx_detail.org_id
AND xxx_line.request_id = xxx_detail.request_id
WHERE xxx_line.request_id = 13414224
I also replaced the old-fashioned join syntax with the newer ANSI join syntax (which makes relationships clearer by forcing the join tables and conditions to be listed close to each other) and removed the meaningless table aliases (because code complexity is more directly related to the number of variables than the number of characters).

Compared to SQL what does this formula do

A – ( B ∩ A )
I was wondering what this set of mathematics can translate to when looking at,comparing it with SQL (operators).
If A and B are tables of the same "type" (same number of columns and compatible datatypes on corresponding columns), this can be translated to SQL like this:
A EXCEPT (A INTERSECT B)
which is of course equivalent (both in set and relational algebra) to:
A EXCEPT B
If A and B are tables of different "type", then set operations do not make sense between them. (Joins are something different, they should not be confused with Unions, Differences or Intersections, no matter how popular a link that "explains" them is.)
The fact that Unions, Differences and Intersections can also be expressed in several ways (using (LEFT) JOIN, (NOT) IN, (NOT) EXISTS combinations, besides the explicit UNION, EXCEPT and INTERSECT operators) does not change that.
The syntax is not exactly as above. One can use either (works in Postgres and SQL-Server. It also works in Oracle if one replaces EXCEPT with MINUS):
SELECT *
FROM a
EXCEPT
( SELECT *
FROM a
INTERSECT
SELECT *
FROM b
) ;
or this (works in Postgres 8.4 and above: SQL-Fiddle test)
SELECT *
FROM
( TABLE a
EXCEPT
( TABLE a INTERSECT TABLE b )
) t ;
and even this (Look ma, no SELECT!):
TABLE a
EXCEPT
( TABLE a INTERSECT TABLE b ) ;
Just to give another option in terms of SQL:
SELECT id FROM A
MINUS
SELECT id FROM B
UPD:
Beware that MINUS removes duplicates from the final result set and only exists in Oracle.
SQL standard uses EXCEPT which is supported by other vendors:
SELECT id FROM A
EXCEPT
SELECT id FROM B
In standard it has DISTINCT option which should remove dupes. I guess you will have to check docs of s specific vendor to see if duplicates will be removed. For example SQL Server's implementation of EXCEPT is the same as MINUS in Oracle.
This computes the set difference of A and B and in mats is also equivelent to A - B. So what you are interested in here are the elements in A that are not in B.
You can have a look at this blog post to see how set difference can be done in mysql.
That would be SET A MINUS (SET A INNER JOIN SET B). You are taking the records that occur in both A and B (INNER JOIN), then removing those from SET A (MINUS)
The intersecting in set theory terms is equivilant to INNER JOIN in SQL terms.
Edit: As for your question about being the same as NOT IN, not really. NOT IN is an operator used on column values ('selection'). The MINUS, JOIN, etc are set operators to perform operations between sets of rows.
A – ( B ∩ A ) equivalent is A LEFT OUTER JOIN B WHERE TableB.id IS null
REFERENCE link HERE
Another option will be to use
SELECT * FROM A
WHERE
NOT EXISTS (SELECT * FROM B WHERE A.Id = B.id)

In an EXISTS can my JOIN ON use a value from the original select

I have an order system. Users with can be attached to different orders as a type of different user. They can download documents associated with an order. Documents are only given to certain types of users on the order. I'm having trouble writing the query to check a user's permission to view a document and select the info about the document.
I have the following tables and (applicable) fields:
Docs: DocNo, FileNo
DocAccess: DocNo, UserTypeWithAccess
FileUsers: FileNo, UserType, UserNo
I have the following query:
SELECT Docs.*
FROM Docs
WHERE DocNo = 1000
AND EXISTS (
SELECT * FROM DocAccess
LEFT JOIN FileUsers
ON FileUsers.UserType = DocAccess.UserTypeWithAccess
AND FileUsers.FileNo = Docs.FileNo /* Errors here */
WHERE DocAccess.UserNo = 2000 )
The trouble is that in the Exists Select, it does not recognize Docs (at Docs.FileNo) as a valid table. If I move the second on argument to the where clause it works, but I would rather limit the initial join rather than filter them out after the fact.
I can get around this a couple ways, but this seems like it would be best. Anything I'm missing here? Or is it simply not allowed?
I think this is a limitation of your database engine. In most databases, docs would be in scope for the entire subquery -- including both the where and in clauses.
However, you do not need to worry about where you put the particular clause. SQL is a descriptive language, not a procedural language. The purpose of SQL is to describe the output. The SQL engine, parser, and compiler should be choosing the most optimal execution path. Not always true. But, move the condition to the where clause and don't worry about it.
I am not clear why do you need to join with FileUsers at all in your subquery?
What is the purpose and idea of the query (in plain English)?
In any case, if you do need to join with FileUsers then I suggest to use the inner join and move second filter to the WHERE condition. I don't think you can use it in JOIN condition in subquery - at least I've never seen it used this way before. I believe you can only correlate through WHERE clause.
You have to use aliases to get this working:
SELECT
doc.*
FROM
Docs doc
WHERE
doc.DocNo = 1000
AND EXISTS (
SELECT
*
FROM
DocAccess acc
LEFT OUTER JOIN
FileUsers usr
ON
usr.UserType = acc.UserTypeWithAccess
AND usr.FileNo = doc.FileNo
WHERE
acc.UserNo = 2000
)
This also makes it more clear which table each field belongs to (think about using the same table twice or more in the same query with different aliases).
If you would only like to limit the output to one row you can use TOP 1:
SELECT TOP 1
doc.*
FROM
Docs doc
INNER JOIN
FileUsers usr
ON
usr.FileNo = doc.FileNo
INNER JOIN
DocAccess acc
ON
acc.UserTypeWithAccess = usr.UserType
WHERE
doc.DocNo = 1000
AND acc.UserNo = 2000
Of course the second query works a bit different than the first one (both JOINS are INNER). Depeding on your data model you might even leave the TOP 1 out of that query.