Joining varchar and nvarchar - sql

I'm comparing account numbers in two different databases to make sure the account exists in both. The account field in one database is nvarchar and the other it's varchar. I do a cast to cast them both to varchar(12) and join them to see where there isn't a match. If there is an account number with less than 12 characters then it thinks it's not a match. I'm assuming the extra characters in each field are causing the issue?
table1 - accountnumber(nvarchar(255))
table2 - accountnumber(varchar(20))
select * from
table1
left outer join table2 on table2.accountnumber = table1.accountnumber
In this one example, both tables have an account with the number 12345678, but the join isn't working. I'm not sure if it's data type mismatch or white space or something else.
--Added--
I should add that the data in table2 actually originates from an Oracle database where it's stored as a varchar2(12 byte). I import it into a SQL Server database where it's stored as a varchar(20). I'm not sure if this makes a difference.

Not sure where you are having a problem. This query should return matching account numbers (no need to CAST):
SELECT *
FROM YourTable
JOIN YourOtherTable ON YourTable.AccountNumber = YourOtherTable.AccountNumber
If your data has spaces, you can TRIM your data depending on your RDBMS -- LTRIM and RTRIM for SQL Server.
SELECT *
FROM YourTable
JOIN YourOtherTable ON RTRIM(LTRIM(YourTable.AccountNumber)) = RTRIM(LTRIM(YourOtherTable.AccountNumber))
Here is the SQL Fiddle.
Good luck.

Your query works fine. This is perhaps a character encoding issue. Try using collate. See this previous SO answer which might help.

I ran into absolutely same case, I had even two sibling queries (one created as a copy of another), which both had this problem. Collation and types were no issue here.
Finally after a LOT of testing, one of the queries started to work without aparent changes, just re-written. When I retyped the IN part of the second query, it started to work too.
So there was a problem with a hidden character accidentally typed somewhere in the query.

Related

Add column with substring of other column in SQL (Snowflake)

I feel like this should be simple but I'm relatively unskilled in SQL and I can't seem to figure it out. I'm used to wrangling data in python (pandas) or Spark (usually pyspark) and this would be a one-liner in either of those. Specifically, I'm using Snowflake SQL, but I think this is probably relevant to a lot of flavors of SQL.
Essentially I just want to trim the first character off of a specific column. More generally, what I'm trying to do is replace a column with a substring of the same column. I would even settle for creating a new column that's a substring of an existing column. I can't figure out how to do any of these things.
On obvious solution would be to create a temporary table with something like
CREATE TEMPORARY TABLE tmp_sub AS
SELECT id_col, substr(id_col, 2, 10) AS id_col_sub FROM table1
and then join it back and write a new table
CREATE TABLE table2 AS
SELECT
b.id_col_sub as id_col,
a.some_col1, a.some_col2, ...
FROM table1 a
JOIN tmp_sub b
ON a.id_col = b.id_col
My tables have roughly a billion rows though and this feels extremely inefficient. Maybe I'm wrong? Maybe this is just the right way to do it? I guess I could replace the CREATE TABLE table2 AS... to INSERT OVERWRITE INTO table1 ... and at least that wouldn't store an extra copy of the whole thing.
Any thoughts and ideas are most welcome. I come at this humbly from the perspective of someone who is baffled by a language that so many people seem to have mastery over.
I'm not sure the exact syntax/functions in Snowflake but generally speaking there's a few different ways of achieving this.
I guess the general approach that would work universally is using the SUBSTRING function that's available in any database.
Assuming you have a table called Table1 with the following data:
+-------+-----------------------------------------+
Code | Desc
+-------+-----------------------------------------+
0001 | 1First Character Will be Removed
0002 | xCharacter to be Removed
+-------+-----------------------------------------+
The SQL code to remove the first character would be:
select SUBSTRING(Desc,2,len(desc)) from Table1
Please note that the "SUBSTRING" function may vary according to different databases. In Oracle for example the function is "SUBSTR". You just need to find the Snowflake correspondent.
Another approach that would work at least in SQLServer and MySQL would be using the "RIGHT" function
select RIGHT(Desc,len(Desc) - 1) from Table1
Based on your question I assume you actually want to update the actual data within the table. In that case you can use the same function above in an update statement.
update Table1 set Desc = SUBSTRING(Desc,2,len(desc))
You didn't try this?
UPDATE tableX
SET columnY = substr(columnY, 2, 10 ) ;
-Paul-
There is no need to specify the length, as is evidenced from the following simple test harness:
SELECT $1
,SUBSTR($1, 2)
,RIGHT($1, -2)
FROM VALUES
('abcde')
,('bcd')
,('cdef')
,('defghi')
,('e')
,('fg')
,('')
;
Both expressions here - SUBSTR(<col>, 2) and RIGHT(<col>, -2) - effectively remove the first character of the <col> column value.
As for the strategy of using UPDATE versus INSERT OVERWRITE, I do not believe that there will be any difference in performance or outcome, so I might opt for the UPDATE since it is simpler. So, in conclusion, I would use:
UPDATE tableX
SET columnY = SUBSTR(columnY, 2)
;

SQL0802 - invalid numeric data

I'm on a db2 database over as400 system.
I have a select query that is throwing the error in the title: SQL0802 code 6 which is "invalid numeric data" (translated).
I have tried separating the query in different parts and testing each part one by one to see if it works, I am 99% convinced that the problem comes because of a "CAST" clause I am using in a subquery(to cast CHAR to INT), I just don't understand why the subquery works by itself but it doesn't work as a part of the main query.
So if I run the subquery with the "CAST" clause it works fine, but when I run the main query that uses the subquery it doesn't work and the error arises.
Main query can be divided in 2 queries, see the code below.
query1 looks something like this:
select SUM(Price) from TABLE1
where X = 1
group by Country
having SUM(Price) = (query2);
query2 looks something like this:
SELECT SUM(UnitPrice * AmountStocked)
FROM TABLE2
WHERE J = X and ItemNumber in (
SELECT CAST(ItmNumbr AS INT) from TABLE3
where Id in (select Id from TABLE4 where Z=Y)
)
Notes:
*query2 will return a single number.
*Running query2 by itself works fine.
*Running query1 without the "having" clause works fine too.
*If I substitute the "SELECT CAST..." subquery in query2 with something like "(2002, 9912, 1234)" and then run the main query it works fine, so this pretty much confirms that the problem is the "CAST" clause.
*I have to CAST ItmNumbr to INT because ItemNumber is of Numeric type and
ItmNumbr is of Char type.
You said:
*I have to CAST ItmNumbr to INT because ItemNumber is of Numeric type and ItmNumbr is of Char type.
But this is not true. You could cast the other way around:
SELECT SUM(UnitPrice * AmountStocked)
FROM TABLE2
WHERE J = X and CHAR(ItemNumber) in (
SELECT TRIM(ItmNumbr) from TABLE3
where Id in (select Id from TABLE4 where Z=Y)
)
The advantage here is that non-numeric characters in ItmNumber will not blow you up, and CHAR(ItemNumber) should also not fail.
One thing to know about DB2 for i is that there are two ways to create database tables, and the two differ slightly in the characteristics of the resulting table. If the table is created using DDL (CREATE TABLE ...), then that table cannot contain bad data. The data types are verified on write, no matter how you write the data, it is validated before being written to the table. If the table is created by DDS (CRTPF ...), the table can indeed contain bad data because the data is not validated until it is read and loaded into a variable. Old style programs that write data to DDS tables by writing a record from a program described data structure are able to put whatever they want into a DDS defined table, including numeric data in character fields or worse, character data in numeric fields. This usually is only found in very old databases that have been migrated from the System/36 (circa 1980's) which used flat files rather than database files (it had no notion of a database). I only posit this because it is possible. Check the data in your file using hex() to see if there is anything funky in the ItmNumbr or ItemNumber fields.
I am not sure but I am thinking the issue has to do with your join of "WHERE J = X" since we don't know what "J" is and it may not join to "X" (not the correct data type).
Based on your analysis:
"*If I substitute the "SELECT CAST..." subquery in query2 with something like "(2002, 9912, 1234)" and then run the main query it works fine, so this pretty much confirms that the problem is the "CAST" clause."
Check the content of TABLE3.ItmNumbr. If it is defined as NUMERIC (unpacked decimal) it may contain non-numeric values (typically spaces). That may be causing the error you are observing.

SQL - no rows selected after inner join - I don't get it

Ok here are my two tables I'm trying to do a join on, using ORACLE:
FOURNISSEUR
TABLE4
And I'm sorry for copying pictures but I'm having a hard time copying tables from MYSQLPLUS..
I'm trying to do a join on NF but it doesn't seem to work... what am I doing wrong?
SELECT fournisseur.NF,fournisseur.NomF
FROM fournisseur
INNER JOIN table4
ON fournisseur.NF=table4.NF
ORDER BY fournisseur.NF;
And yeah I feel stupid..
Does the NF column in the fournisseur table have spaces after the values?
The column heading for fournisseur.NF looks really wide and appears to be displaying a VARCHAR2(20) (or CHAR(20); but see below) column which could also mean there is extra white-space.
Try trimming the values. e.g.
ON TRIM(fournisseur.NF) = table4.NF
If this indeed works, then I'd look into using CHAR(2), which is hopefully the same type as table4.NF, for fournisseur.NF which would avoid this issue simply by not allowing the extra spaces to begin with.
Since filler spaces on the end of a CHAR(n) field "don't mean anything" then using CHAR(n) types throughout would also remove the observed issue.
Here is a SQL Fiddle, modified from Coat CO's comment, which shows lack-of-join behavior when there are extra spaces in a VARCHAR2(n) column.

Need to UPPER SQL statement with INNER JOIN SELECT

I'm using Pervasive SQL 10.3 (let's just call it MS SQL since almost everything is the same regarding syntax) and I have a query to find duplicate customers using their email address as the duplicate key:
SELECT arcus.idcust, arcus.email2
FROM arcus
INNER JOIN (
SELECT arcus.email2, COUNT(*)
FROM arcus WHERE RTRIM(arcus.email2) != ''
GROUP BY arcus.email2 HAVING COUNT(*)>1
) dt
ON arcus.email2=dt.email2
ORDER BY arcus.email2";
My problem is that I need to do a case insensitive search on the email2 field. I'm required to have UPPER() for the conversion of those fields.
I'm a little stuck on how to do an UPPER() in this query. I've tried all sorts of combinations including one that I thought for sure would work:
... ON UPPER(arcus.email2)=UPPER(dt.email2) ...
... but that didn't work. It took it as a valid query, but it ran for so long I eventually gave up and stopped it.
Any idea of how to do the UPPER conversion on the email2 field?
Thanks!
If your database is set up to be case sensitive, then your inner query will have to take account of this to perform the grouping as you intended. If it is not case sensitive, then you won't require UPPER functions.
Assuming your database IS case sensitive, you could try the query below. Maybe this will run faster...
SELECT arcus.idcust, arcus.email2
FROM arcus
INNER JOIN (
SELECT UPPER(arcus.email2) as upperEmail2, COUNT(*)
FROM arcus WHERE RTRIM(arcus.email2) != ''
GROUP BY UPPER(arcus.email2) HAVING COUNT(*)>1
) dt
ON UPPER(arcus.email2) = dt.upperEmail2
Check out this blog post which discusses case insensitive searches in SQL. In essence, the reason why it was so slow was that most likely none of the current table indexes could be used in the query, so the database engine had to perform a full table scan, likely multiple times.
An index on arcus.email2 is completely useless when wanting to compare between the uppercased versions (UPPER(arcus.email2)), because the database engine cannot look up the values in the index (because they're different values!).
To improve the performance, you can create an index specifically on the result of applying UPPER to the field.
CREATE INDEX IX_arcus_UPPER_email2
ON arcus (UPPER(email2));
The collation of a character string will determine how SQL Server compares character strings. If you store your data using a case-insensitive format then when comparing the character string “AAAA” and “aaaa” they will be equal. You can place a collate Latin1_General_CI_AS for your email column in the where clause.
Check the link below for how to implement collation in a sql query.
How to do a case sensitive search in WHERE clause

Select string as number on Oracle

I found this odd behavior and I'm breaking my brains with this... anyone has any ideas?
Oracle 10g:
I have two different tables, both have this column named "TESTCOL" as Varchar2(10), not nullable.
If I perform this query on table1, i get the proper results:
select * from table1 where TESTCOL = 1234;
Note that I'm specifically not placing '1234'... it's not a typo, that's a dynamic generated query and I will try not to change it (at least not in the near future).
But, if I run the same query, on table2, I get this error message:
ORA-01722: Invalid number
Both queries are run on the same session, same database.
I've been joining these two tables by that column and the join works ok, the only problem shows whenever I try to use that condition.
Any ideas on what could be different from one table to the other?
Thanks in advance.
If TESTCOL contains non-numbers, then Oracle might run into problems when converting TESTCOL entries to numbers. Because, what it does internally, is this:
select * from table1 where TO_NUMBER(TESTCOL) = 1234;
If you're so sure that 1234 cannot be expressed as a VARCHAR literal, then try this instead, in order to compare varchar values, rather than numeric ones:
select * from table1 where TESTCOL = TO_CHAR(1234);
Well obvious TABLE2.TESTCOL contains values which are not numbers. Comparing a string to a numeric literal generates an implicit conversion. So any value in TESTCOL hich cannot be cast to a number will hurl ORA-1722.
It doesn't hit you where you compare the two tables because you are comparing strings.
So you have a couple of options, neiher of which you will like. The most obvious answer is to clean the data so TABLE2 hdoesn't contain non-numerics. Ideally you should combine this with changing the column to a numeric data type. Otherwise you can alter the generator so it produces code you can run against a shonky data model. In this case that means wrapping literals in quote marks if the mapped column has a character data type.
You are hitting the perils of implicit typecasting here.
With the expression testcol = 1234 you state that you want to treat testcol as a numeric column, so Oracle tries to convert all values in that column to a number.
The ORA-01722 occurs because apparently at least one value in that column is not a number.
Even though you claim that this is "not a typo" it indeed is one. It's a syntactical error.
You will have to declare your parameter as a string literal using single quotes: where testcol = '1234'
Creating a correct condition is the only solution to your problem.
The following should work. Just replace the "your where".
select *
from table1
where (select TO_NUMBER(TESTCOL)
from table2
where "your where") = 1234;