Cast to INT Fails - sql

I executed a query very similar to this:
SELECT
CAST(A.ValueCol AS INT) * CAST(B.ValueCol AS INT) MultipliedValue
FROM Table1 A
JOIN Table1 B
ON A.LinkCol = B.LinkCol
WHERE
A.TypeCol = <Value that limits ValueCol to only integers>
AND B.TypeCol = <Value that limits ValueCol to only integers>
In this case, ValueCol is type NVARCHAR and contains both integer and non-integer values. Despite adequate filtering in the WHERE clause, I'm getting CAST errors for values that aren't even scoped (e.g. if WHERE filters all non-integer values, SQL is throwing an error trying to cast 'ABC', which does exist in a table row but should not have been pulled into this query). I verified that only integer values were being pulled by removing the CASTs and selecting the two ValueCols independently.
Is there a precedence/order of operations problem here? Is CAST applied to all rows' ValueCols prior to filtering with WHERE?
I know I can use TRY_CAST, just curious about this behavior in SQL. Thank you!

Sometimes, SQL Server will deem it more efficient to run an operation like CAST() for every row in a page (or index), before applying filters from the WHERE clause.
There is no good way to avoid this.
This is one reason why you should not store meaningful data in columns with an ambiguous type, with anti-patterns such as EAV.

Related

How to avoid performance degradation when run query with cast in where clause?

I have a table with 2 varchar columns (name and value)
and I have such a query:
select * from attribute
where name = 'width' and cast( value as integer) > 12
This query works but I suppose there are could be an issue with execution plan because of index build over value column because it is technically varchar but we convert it to integer.
Are there ways to fix it ?
P.S. I can't change type to int because the database design implies that value could be any type.
Performance should not be your first worry here.
Your statement is prone to failures. You read this as:
read all rows with name = 'width'
of these rows cast all values to integer and only keep those with a value graeter than 12
But the DBMS is free to check conditions in the WHERE clause in any order. If the DBMS does that:
cast all values to integer and only keep the rows with a value graeter than 12
of these rows keep all with name = 'width'
the first step will already cause a runtime error, if there is a non-integer value in that table, which is likely.
So first get your query safe. The following should work:
select *
from
(
select *
from attribute
where name = 'width'
) widths
where cast(value as integer) > 12;
This will still fail, when your width contains non-integers. So, to get this even safe in case of invalid data in the table, you may want to add a check that the value only contains digits in the subquery.
And yes, this won't become super-fast. You sacrifice speed (and data consistency checks) for flexibility with this data model.
What you can do, however, is create an index on both columns, so the DBMS can quickly find all width rows and then have the value directly at hand, before it accesses the table:
create index idx on attribute (name, value);
As far as I know, there is no fail-safe cast function in PostgreSQL. Otherwise you could use this and have a function index instead. I may be wrong, so maybe someone can come up with a better solution here.

SQL Server subquery behaviour

I have a case where I want to check to see if an integer value is found in a column of a table that is varchar, but is a mix of values that can be integers some are just strings. My first thought was to use a subquery to just select the rows with numeric-esque values. The setup looks like:
CREATE TABLE #tmp (
EmployeeID varchar(50) NOT NULL
)
INSERT INTO #tmp VALUES ('aa1234')
INSERT INTO #tmp VALUES ('1234')
INSERT INTO #tmp VALUES ('5678')
DECLARE #eid int
SET #eid = 5678
SELECT *
FROM (
SELECT EmployeeID
FROM #tmp
WHERE IsNumeric(EmployeeID) = 1) AS UED
WHERE UED.EmployeeID = #eid
DROP TABLE #tmp
However, this fails, with: "Conversion failed when converting the varchar value 'aa1234' to data type int.".
I don't understand why it is still trying to compare #eid to 'aa1234' when I've selected only the rows '1234' and '5678' in the subquery.
(I realize I can just cast #eid to varchar but I'm curious about SQL Server's behaviour in this case)
You can't easily control the order things will happen when SQL Server looks at the query you wrote and then determines the optimal execution plan. It won't always produce a plan that follows the same logic you typed, in the same order.
In this case, in order to find the rows you're looking for, SQL Server has to perform two filters:
identify only the rows that match your variable
identify only the rows that are numeric
It can do this in either order, so this is also valid:
identify only the rows that are numeric
identify only the rows that match your variable
If you look at the properties of this execution plan, you see that the predicate for the match to your variable is listed first (which still doesn't guarantee order of operation), but in any case, due to data type precedence, it has to try to convert the column data to the type of the variable:
Subqueries, CTEs, or writing the query a different way - especially in simple cases like this - are unlikely to change the order SQL Server uses to perform those operations.
You can force evaluation order in most scenarios by using a CASE expression (you also don't need the subquery):
SELECT EmployeeID
FROM #tmp
WHERE EmployeeID = CASE IsNumeric(EmployeeID) WHEN 1 THEN #eid END;
In modern versions of SQL Server (you forgot to tell us which version you use), you can also use TRY_CONVERT() instead:
SELECT EmployeeID
FROM #tmp
WHERE TRY_CONVERT(int, EmployeeID) = #eid;
This is essentially shorthand for the CASE expression, but with the added bonus that it allows you to specify an explicit type, which is one of the downsides of ISNUMERIC(). All ISNUMERIC() tells you is if the value can be converted to any numeric type. The string '1e2' passes the ISNUMERIC() check, because it can be converted to float, but try converting that to an int...
For completeness, the best solution - if there is an index on EmployeeID - is to just use a variable that matches the column data type, as you suggested.
But even better would be to use a data type that prevents junk data like 'aa1234' from getting into the table in the first place.

Conditional casting of column datatype

i have subquery, that returns me varchar column, in some cases this column contains only numeric values and in this cases i need to cast this column to bigint, i`ve trying to use CAST(case...) construction, but CASE is an expression that returns a single result and regardless of the path it always needs to result in the same data type (or implicitly convertible to the same data type). Is there any tricky way to change column datatype depending on condition in PostgreSQL or not? google cant help me((
SELECT
prefix,
module,
postfix,
id,
created_date
FROM
(SELECT
s."prefix",
coalesce(m."replica", to_char(CAST((m."id_type" * 10 ^ 12) AS bigint) + m."id", 'FM0000000000000000')) "module",
s."postfix",
s."id",
s."created_date"
FROM some_subquery
There is really no way to do what you want.
A SQL query returns a fixed set of columns, with the names and types being fixed. So, a priori what you want to do does not fit well within SQL.
You could work around this, by inventing your own type, that is either a big integer or a string. You could store the value as JSON. But those are work-arounds. The SQL query itself is really returning one "type" for each column; that is how SQL works.

When is the type of a column in a SQL query result determined?

When performing a select query from a data base the returned result will have columns of a certain type.
If you perform a simple query like
select name as FirstName
from database
then the type of the resulting FirstName column will be that of database.name.
If you perform a query like
select age*income
from database
then the resulting data type will be that of the return value from the age*income expression.
What happens you use something like
select try_convert(float, mycolumn)
from database
where database.mycolumn has type of nvarchar. I assume that the resulting column has type of float which is decided by the return type of the first call to try_convert.
But consider this example
select coalesce(try_convert(float, mycolumn), mycolumn)
from database
which should give a column with the values of mycolumn unchanged if try_convert fails, but mycolumn as a float when/if that is possible.
Is this determination made as the first row is handled?
Or will the type always be determined by the function called independently of the data in the rows?
Is it possible to conditionally perform a conversion?
I would like to convert to float in the case where this is possible for all rows and leave unchanged in case it fails for any row.
Update 1
It seems that the answer to the first part of the question is that the column type is determined by the expression at compile time which means that you cannot have a dynamic type of your column depending on the data.
I see two workaround for this
Option 1
For each column count the number of not null rows of try_convert(float, mycolumn) and if this number is 0 then do not perform conversion. This will of course read the rows many times and might be inefficient.
Option 2
Simple repeat all columns; once without conversion and once with conversion and then simply use the interesting one.
One could also perform another select statement where only columns with non-null values are included.
Background
I have a dynamically generated pivot table with many (~200 columns) of which some have string values and others have numbers.
I would like to cast all columns as float where this is possible and leave the other columns unchanged (or cast as nvarchar).
The data
The data is mostly NULL values with some columns having text string and other columns having numbers. There are no columns with "mixed" content.
The types are determined at compile time, not at execution. try_convert(float, ...) knows exactly the type at parse/compile time, because float here is a keyword, not a value. As for expressions like COALESCE(foo, bar) the type similarly determined at compile time, following the rules of data type precedence lad already linked.
When you build your dynamic pivot you'll have to know the result type, using the same inference rules the SQL parser/compiler uses. I understand some rules are counter intuitive, when in doubt, test it out.
For the detail oriented: some expressions types can be determined at parse time, eg. N'foo'. But most have to be resolved at compile time, when the names of tables and columns are bind to actual object in the database, because only then the type is discovered.

SQL server in operator performance

I have a query that joins some tables and when I use = operator instead of in operator in a where clause like this I get a significant performance improvement.
= operator takes less than a second.
in operator takes about a minute.
where P.GID in ( SELECT GID from [dbo].[fn_SomeFunction] (15268) )
the sub query returns 1 result in most of the cases and just this change will improve most of the cases but will cause errors for some other cases.
any ideas why this behavior?
Try something like that it is not tested and may contain some syntactic errors.
The main idea is to get the desired ids in a temp table variable and use that in a join.
Hope that helps.
DECLARE #gids TABLE(
GID UNIQUEIDENTIFIER NOT NULL
)
INSERT INTO #gids (GID)
SELECT
GID
FROM [dbo].[fn_SomeFunction](15268)
SELECT * FROM SomeTable st INNER JOIN st.GID = #gids.GID
If that truly represents a functions that is what is killing your execution time. You can verify this by looking at the execution plan.
What happens is that in your subquery that function has to be calculated over and over again. If you can remove the function and replace with a query you should notice an improvement.
Additionally, if the functions return type doesn't match the column's type there will need to be an implicit conversion done on each return item. Changing the functions return type(or the columns type) to match would also help performance.
DateTime <> smalldatetime for example and requires an implicit conversion.