Oracle Invalid Number in Join Clause - sql

I am getting an Oracle Invalid Number error that doesn't make sense to me. I understand what this error means but it should not be happening in this case. Sorry for the long question, but please bear with me so I can explain this thoroughly.
I have a table which stores IDs to different sources, and some of the IDs can contain letters. Therefore, the column is a VARCHAR.
One of the sources has numeric IDs, and I want to join to that source:
SELECT *
FROM (
SELECT AGGPROJ_ID -- this column is a VARCHAR
FROM AGG_MATCHES -- this is the table storing the matches
WHERE AGGSRC = 'source_a'
) m
JOIN SOURCE_A a ON a.ID = TO_NUMBER(m.AGGPROJ_ID);
In most cases this works, but depending on random things such as what columns are in the select clause, if it uses a left join or an inner join, etc., I will start seeing the Invalid Number error.
I have verified multiple times that all entries in AGG_MATCHES where AGGSRC = 'source_a' do not contain non numeric characters in the AGGPROJ_ID column:
-- this returns no results
SELECT AGGPROJ_ID
FROM AGG_MATCHES
WHERE AGGSRC = 'source_a' AND REGEXP_LIKE(AGGPROJ_ID, '[^0-9]');
I know that Oracle basically rewrites the query internally for optimization. Going back to the first SQL example, my best guess is that depending on how the entire query is written, in some cases Oracle is trying to perform the JOIN before the sub query. In other words, it's trying to join the entire AGG_MATCHES tables to SOURCE_A instead of just the subset returned by the sub query. If so, there would be rows that contain non numeric values in the AGGPROJ_ID column.
Does anyone know for certain if this is what's causing the error? If it is the reason, is there anyway for me to force Oracle to execute the sub query part first so it's only trying to join a subset of the AGG_MATCHES table?
A little more background:
This is obviously a simplified example to illustrate the problem. The AGG_MATCHES table is used to store "matches" between different sources (i.e. projects). In other words, it's used to say that a project in sourceA is matched to a project in sourceB.
Instead of writing the same SQL over and over, I've created views for the sources we commonly use. The idea is to have a view with two columns, one for SourceA and one for SourceB. For this reason, I don't want to use TO_CHAR on the ID column of the source table, because devs would have to remember to do this every time they are doing a join, and I'm trying to remove code duplication. Also, since the ID in SOURCE_A is a number, I feel that any view storing SOURCE_A.ID should go ahead and convert it to a number.

You are right that Oracle is executing the statement in a different order than what you wrote, causing conversion errors.
The best ways to fix this problem, in order, are:
Change the data model to always store data as the correct type. Always store numbers as numbers, dates as dates, and strings as strings. (You already know this and said you can't change your data model, this is a warning for future readers.)
Convert numbers to strings with a TO_CHAR.
If you're on 12.2, convert strings to numbers using the DEFAULT return_value ON CONVERSION ERROR syntax, like this:
SELECT *
FROM (
SELECT AGGPROJ_ID -- this column is a VARCHAR
FROM AGG_MATCHES -- this is the table storing the matches
WHERE AGGSRC = 'source_a'
) m
JOIN SOURCE_A a ON a.ID = TO_NUMBER(m.AGGPROJ_ID default null on conversion error);
Add a ROWNUM to an inline view to prevent optimizer transformations that may re-write statements. ROWNUM is always evaluated at the end and it forces Oracle to run things in a certain order, even if the ROWNUM isn't used. (Officially hints are the way to do this, but getting hints right is too difficult.)
SELECT *
FROM (
SELECT AGGPROJ_ID -- this column is a VARCHAR
FROM AGG_MATCHES -- this is the table storing the matches
WHERE AGGSRC = 'source_a'
--Prevent optimizer transformations for type safety.
AND ROWNUM >= 1
) m
JOIN SOURCE_A a ON a.ID = TO_NUMBER(m.AGGPROJ_ID);

I think the simplest solution uses case, which has more guarantees on the order of evaluation:
SELECT a.*
FROM AGG_MATCHES m JOIN
SOURCE_A a
ON a.ID = (CASE WHEN m.AGGSRC = 'source_a' THEN TO_NUMBER(m.AGGPROJ_ID) END);
Or, better yet, convert to strings:
SELECT a.*
FROM AGG_MATCHES m JOIN
SOURCE_A a
ON TO_CHAR(a.ID) = m.AGGPROJ_ID AND
m.AGGSRC = 'source_a' ;
That said, the best advice is to fix the data model.
Possibly the best solution in your case is simply a view or a generate column:
create view v_agg_matches_a as
select . . .,
(case when regexp_like(AGGPROJ_ID, '^[0-9]+$')
then to_number(AGGPROJ_ID)
end) as AGGPROJ_ID
from agg_matches am
where m.AGGSRC = 'source_a';
The case may not be necessary if you use a view, but it is safer.
Then use the view in subsequent queries.

Related

Can I use WHERE clause after JOIN USING in snowflake?

Can I use WHERE after
JOIN USING?
In my case if I run on snowflake multiple times the same code:
with CTE1 as
(
select *
from A
left join B
on A.date_a = B.date_b
)
select *
from CTE1
inner join C
using(var1_int)
where CTE1.date_a >= date('2020-10-01')
limit 1000;
sometimes I get a result and sometimes i get the error:
SQL compilation error: Can not convert parameter 'DATE('2020-10-01')' of type [DATE] into expected type [NUMBER(38,0)]
where NUMBER(38,0) is the type of var1_int column
Your problem has nothing to do with the existence of a where clause. Of course you can use a where clause after joins. That is how SQL queries are constructed.
According to the error message, CTE1.date_a is a number. Comparing it to a date results in a type-conversion error. If you provided sample data and desired results, then it might be possible to suggest a way to fix the problem.
tl;dr: Instead of JOIN .. USING() always prefer JOIN .. ON.
You are right to be suspicious of the results. Given your staging, only one of these queries returns without errors:
select a.date_1, id_1
from AE_USING_TST_A a
left join AE_USING_TST_B b
on a.date_1 = b.date_2
join AE_USING_TST_C v
using(id_1)
where A.date_1 >= date('2020-10-01')
-- Can not convert parameter 'DATE('2020-10-01')' of type
-- [DATE] into expected type [NUMBER(38,0)]
;
select a.date_1, a.id_1
from AE_USING_TST_A a
left join AE_USING_TST_B b
on a.date_1 = b.date_2
join AE_USING_TST_C v
on a.id_1=v.id_1
where A.date_1 >= date('2020-10-01')
-- 2020-10-11 2
;
I would call this a bug, except that the documentation is clear about not doing this kind of queries with JOIN .. USING:
To use the USING clause properly, the projection list (the list of columns and other expressions after the SELECT keyword) should be “*”. This allows the server to return the key_column exactly once, which is the standard way to use the USING clause. For examples of standard and non-standard usage, see the examples below.
https://docs.snowflake.com/en/sql-reference/constructs/join.html
The documentation doubles down on the problems of using USING() on non-standard situations, with a different query acting "wrong":
The following example shows non-standard usage; the projection list contains something other than “*”. Because the usage is non-standard, the output contains two columns named “userid”, and the second occurrence (which you might expect to contain a value from table ‘r’) contains a value that is not in the table (the value ‘a’ is not in the table ‘r’).
So just prefer JOIN .. ON. For extra discussion on the SQL ANSI standard not defining behavior for some cases of USING() check:
https://community.snowflake.com/s/question/0D50Z00008WRZBBSA5/bug-with-join-using-

SQL: combine 2 columns in one table to compare with 1 column in another table

I have 2 tables in MSSQL one is called [DATA] and the other one is called [2020]
IN [2020] I have user info for example: name , ID, adress, and so on.
IN [DATA] I have multiple columns that I need to show about the users
the catch is that I have no common column between those tables, in [2020] I have ID and adress, in [DATA] I have that info in one column called ID-adress
SO I tried this
SELECT
A.ID,
A.Nro,
A.name,
A.adress,
B.CGE AS 'GE',
B.CGG as 'GG',
B.CPE AS 'PE'
FROM
[202009] A
LEFT JOIN [DATA] B ON str(A.ID )+'-'+cast(A.adress as varchar) = B.[ID-adress]
But I get NULL on the columns that I want to show on B ¿what I'm doing wrong?
From your rather sparse description, I would guess that the JOIN condition never evaluates to true.
Based on the naming of the columns, you may want:
FROM [202009] A LEFT JOIN
[DATA] B
ON CONCAT(A.ID, '-', A.adress) = B.[ID-adress]
This is speculation based on a reasonable interpretation of the question.
There are two potential problems with your construct:
str() tends to pad values with spaces.
varchar without a length has a default length depending on context that may not be sufficient for what you want to do.
Here is a suggestion for debug: add your expression (str(A.ID )+'-'+cast(A.adress as varchar)) in your SELECT to see whether results are what you expect.
Your immediate issue, I believe, is that your string functions str(A.ID )+'-'+cast(A.adress as varchar) have several problems.
I suggest just putting that SQL fragment into a SELECT to see what you get (e.g.,
SELECT A.ID, A.adress, str(A.ID )+'-'+cast(A.adress as varchar)
FROM [202009] A
to see what results you get.
I believe
The ID is numeric (int, probably). str(A.ID) will usually put leading spaces before it, and when comparing strings, ' 5' is different from '5'. Put an LTRIM in front of it e.g., ltrim(str(A.ID))
cast(A.adress as varchar) needs a length of the varchar e.g., cast(A.adress as varchar(100)). If you don't do this, it assumes length 1.
To check, run a similar statement as above, but with fixes
SELECT A.ID, A.adress, ltrim(str(A.ID))+'-'+cast(A.adress as varchar(100))
FROM [202009] A
However (and I don't like to say this), but this design is ... problematic. For example, What happens when a user changes their address? All fields in the data table will be unusable.
The biggest-bang-for-buck fix (imo) is to have 2 separate fields in the Data table for ID and address. If needed, you can still keep your single field as well (ID-address).
As you say, make sure they have common columns.
Once you have the two fields, you can join them on fields that at least match to get the relevant data rows for that address e.g.,
SELECT
A.ID,
A.Nro,
A.name,
A.adress,
B.CGE AS 'GE',
B.CGG as 'GG',
B.CPE AS 'PE'
FROM
[202009] A
LEFT JOIN [DATA] B ON A.ID = B.ID AND A.adress = B.adress
And then, if you just want all the data rows for a given user, you can do the same thing but do not join on address (just A.ID = B.ID).
Note that this won't fix all the issues (e.g., when an address in one table is different from the other table because of things like spaces, or 'St' vs 'Street') - but will be a good start.

SELECT query error ORA-01722 (invalid number)

I am facing this error message with my query and can't manage to figure it out. Can anyone take a look at my query and share some insights? Thanks a lot
Oracle database
SELECT BB_BB60.USERS.FIRSTNAME, BB_BB60.USERS.LASTNAME,
BB_BB60.USERS.STUDENT_ID AS IDNUMBER, BB_BB60.USERS.USER_ID AS USERNAME,
REPLACE(CRSADMIN.OCCURRENCE.CODE, '/','_') AS DESCRIPTION,
'Moodle_2019' AS PASSWORD,
SUBSTR(BB_BB60.COURSE_MAIN.COURSE_ID, 0, 7) AS DEPARTMENT
FROM (CRSADMIN.OCCURRENCE
INNER JOIN CRSADMIN.REQ_OCC ON CRSADMIN.OCCURRENCE.PK = CRSADMIN.REQ_OCC.OCC_PK1)
INNER JOIN ((BB_BB60.COURSE_USERS INNER JOIN BB_BB60.COURSE_MAIN ON BB_BB60.COURSE_USERS.CRSMAIN_PK1 = BB_BB60.COURSE_MAIN.PK1)
INNER JOIN BB_BB60.USERS ON BB_BB60.COURSE_USERS.USERS_PK1 = BB_BB60.USERS.PK1) ON CRSADMIN.REQ_OCC.REQ_PK1 = BB_BB60.COURSE_MAIN.PK1
WHERE (((BB_BB60.COURSE_MAIN.COURSE_ID) = 'PARA602_2019_02'));
The likely problem is with your data model. In at least one of your JOIN criteria you are joining a numeric column to a varchar2 column, which leads to an implicit casting to a number. However, your varchar2 column contains strings which aren't numeric, consequently the join hurls ORA-01722.
Without knowing your table structures we can't identify the problem columns so you'll need to figure it out for yourself.
One solution would be to cast the numeric column to a string e.g.
on t1.vcol_pk = to_char(t2.ncol_pk)
This might have performance implications (the optimiser won't use the index on t2.ncol_pk).
A better solution would be to fix the data model so you don't need to compare strings and numbers, and also clear up your data.

Improve Netezza SQL Query That Contains Hundreds of Strings in WHERE Clause

I have a Netezza query with a WHERE clause that includes several hundred potential strings. I'm surprised that it runs, but it takes time to complete and occasionally errors out ('transaction rolled back by client'). Here's a pseudo code version of my query.
SELECT
TO_CHAR(X.I_TS, 'YYYY-MM-DD') AS DATE,
X.I_SRC_NM AS CHANNEL,
X.I_CD AS CODE,
COUNT(DISTINCT CASE WHEN X.I_FLG = 1 THEN X.UID ELSE NULL) AS WIDGETS
FROM
(SELECT
A.I_TS,
A.I_SRC_NM,
A.I_CD,
B.UID,
B.I_FLG
FROM
SCHEMA.DATABASE.TABLE_A A
LEFT JOIN SCHEMA.DATABASE.TABLE_B B ON A.UID = B.UID
WHERE
A.I_TS BETWEEN '2017-01-01' AND '2017-01-15'
AND B.TAB_CODE IN ('00AV', '00BX', '00C2', '00DJ'...
...
...
...
...
...
...
...)
) X
GROUP BY
X.I_TS,
X.I_SRC_NM,
X.I_CD
;
In my query, I'm limiting the results on B.TAB_CODE to about 1,200 values (out of more than 10k). I'm honestly surprised that it works at all, but it does most of the time.
Is there a more efficient way to handle this?
If the IN clause becomes too cumbersome, you can make your query in multiple parts. Create a temporary table containing a TAB_CODE set then use it in a JOIN.
WITH tab_codes(tab_code) AS (
SELECT '00AV'
UNION ALL
SELECT '00BX'
--- etc ---
)
SELECT
TO_CHAR(X.I_TS, 'YYYY-MM-DD') AS DATE,
X.I_SRC_NM AS CHANNEL,
--- etc ---
INNER JOIN tab_codes Q ON B.TAB_CODES = Q.tab_code
If you want to boost performance even more, consider using a real temporary table (CTAS)
We've seen situations where it's "cheaper" to CTAS the original table to another, distributed on your primary condition, and then querying that table instead.
If im guessing correctly , the X.I_TS is in fact a ‘timestamp’, and as such i expect it to contain many different values per day. Can you confirm that?
If I’m right the query can possibly benefit from changing the ‘group by X.I._TS,...’ to ‘group by 1,...’
Furthermore the ‘Count(Distinct Case...’ can never return anything else than 1 or NULL. Can you confirm that?
If I’m right on that, you can get rid of the expensive ‘DISTINCT’ by changing it to ‘MAX(Case...’
Can you follow me :)

Alternate solution for the query - Used INTERSECT function in oracle plsql

I am working on the query. I have two tables one is detail table where not grouping happen and its like including all the values and other table is line table which has important column grouped together from detail table.
I want to show all the column from line table and some column from detail table.
I am using below query to fetch my records
SELECT ab.*,
cd.phone_number,
cd.id
FROM xxx_line ab,
xxx_detail cd
WHERE cd.reference_number = ab.reference_number
AND cd.org_id = ab.org_id
AND cd.request_id = ab.request_id
AND ab.request_id = 13414224
INTERSECT
SELECT ab.*,
cd.phone_number,
cd.id
FROM xxx_line ab,
xxx_detail cd
WHERE cd.reference_number = ab.reference_number
AND cd.org_id = ab.org_id
AND cd.request_id = ab.request_id
AND ab.request_id = 13414224
The query is working fine...
But I want to know is there any other way for I can achieve the same result by not even using Intersect.
I purpose is to find out all possible way to get the same output.
The INTERSECT operator returns the unique set of rows returned by each query. The code can be re-written with a DISTINCT operator to make the meaning clearer:
SELECT DISTINCT
xxx_line.*,
xxx_detail.phone_number,
xxx_detail.id
FROM xxx_line
JOIN xxx_detail
ON xxx_line.reference_number = xxx_detail.reference_number
AND xxx_line.org_id = xxx_detail.org_id
AND xxx_line.request_id = xxx_detail.request_id
WHERE xxx_line.request_id = 13414224
I also replaced the old-fashioned join syntax with the newer ANSI join syntax (which makes relationships clearer by forcing the join tables and conditions to be listed close to each other) and removed the meaningless table aliases (because code complexity is more directly related to the number of variables than the number of characters).