When is the type of a column in a SQL query result determined? - sql

When performing a select query from a data base the returned result will have columns of a certain type.
If you perform a simple query like
select name as FirstName
from database
then the type of the resulting FirstName column will be that of database.name.
If you perform a query like
select age*income
from database
then the resulting data type will be that of the return value from the age*income expression.
What happens you use something like
select try_convert(float, mycolumn)
from database
where database.mycolumn has type of nvarchar. I assume that the resulting column has type of float which is decided by the return type of the first call to try_convert.
But consider this example
select coalesce(try_convert(float, mycolumn), mycolumn)
from database
which should give a column with the values of mycolumn unchanged if try_convert fails, but mycolumn as a float when/if that is possible.
Is this determination made as the first row is handled?
Or will the type always be determined by the function called independently of the data in the rows?
Is it possible to conditionally perform a conversion?
I would like to convert to float in the case where this is possible for all rows and leave unchanged in case it fails for any row.
Update 1
It seems that the answer to the first part of the question is that the column type is determined by the expression at compile time which means that you cannot have a dynamic type of your column depending on the data.
I see two workaround for this
Option 1
For each column count the number of not null rows of try_convert(float, mycolumn) and if this number is 0 then do not perform conversion. This will of course read the rows many times and might be inefficient.
Option 2
Simple repeat all columns; once without conversion and once with conversion and then simply use the interesting one.
One could also perform another select statement where only columns with non-null values are included.
Background
I have a dynamically generated pivot table with many (~200 columns) of which some have string values and others have numbers.
I would like to cast all columns as float where this is possible and leave the other columns unchanged (or cast as nvarchar).
The data
The data is mostly NULL values with some columns having text string and other columns having numbers. There are no columns with "mixed" content.

The types are determined at compile time, not at execution. try_convert(float, ...) knows exactly the type at parse/compile time, because float here is a keyword, not a value. As for expressions like COALESCE(foo, bar) the type similarly determined at compile time, following the rules of data type precedence lad already linked.
When you build your dynamic pivot you'll have to know the result type, using the same inference rules the SQL parser/compiler uses. I understand some rules are counter intuitive, when in doubt, test it out.
For the detail oriented: some expressions types can be determined at parse time, eg. N'foo'. But most have to be resolved at compile time, when the names of tables and columns are bind to actual object in the database, because only then the type is discovered.

Related

Msg 245, Level 16, State 1, Line 4 Conversion failed when converting the nvarchar value '239.6' to data type int

I have this query:
SELECT SerialNumber
FROM [ETEL-PRDSQL].[ERP10DBLIVE].[ERP].[SerialNo]
WHERE CustNum IN (2);
It's causing this error:
Msg 245, Level 16, State 1, Line 4
Conversion failed when converting the nvarchar value '239.6' to data type int.
The query works if I compare CustNum with a different value, but it fails when I try CustNum IN (2).
How can I fix this?
You have a varchar column named CustNum. The varchar values in this column may contain only digits, but that doesn't make them numbers! Then you compare this text column with the integer value 2. Again, the integer value 2 is not the same as the text value '2'. It's also not the same as the floating point value 2.0. These are all different, they have different types, and SQL Server must resolve any such differences before it can compare values.
Based on type precedence rules SQL Server determines it needs to convert the text in the column to the integer, instead of vice versa. Once this determination is made for the query, if you have any data in the text column that is not integer-compatible, the query is going to fail.
It's important to understand this conversion happens separately from the conditional check in the WHERE clause, and is a prerequisite for that check. It's not enough to expect the WHERE condition to evaluate to FALSE for rows that do not convert. This is true even if you don't need the row, because SQL Server can't know you don't need that row until after it attempts the conversion!
In this case, we have the value 293.6. This value may be numeric, but it is not an integer. Nor is it convertible to integer. Therefore the query fails.
In addition to (eventually!) failing the query, this is absolutely awful for performance. SQL Server has to do this conversion for every row in the table... even rows you don't need. This is because SQL Server doesn't know which rows will match the WHERE clause until after it checks the conditional expression, and it needs to do this conversion in order to make that check. Worse still, the new converted value no longer matches your indexes, so any indexes you might have become worthless for this query. That cuts to the core of database performance.
If you don't like it, define your data types better, or trying comparing the string with another string:
SELECT SerialNumber
FROM [ETEL-PRDSQL].[ERP10DBLIVE].[ERP].[SerialNo]
WHERE CustNum IN ('2');
The query might also run if you did this:
SELECT SerialNumber
FROM [ETEL-PRDSQL].[ERP10DBLIVE].[ERP].[SerialNo]
WHERE CustNum IN (2.0);
Now the type precedence rules will convert your text to a floating point type, and it's possible that will succeed if the rest of the values in the table are compatible. It's also possible this is closer to what you intend... but again, the performance here will be much worse.

How to avoid performance degradation when run query with cast in where clause?

I have a table with 2 varchar columns (name and value)
and I have such a query:
select * from attribute
where name = 'width' and cast( value as integer) > 12
This query works but I suppose there are could be an issue with execution plan because of index build over value column because it is technically varchar but we convert it to integer.
Are there ways to fix it ?
P.S. I can't change type to int because the database design implies that value could be any type.
Performance should not be your first worry here.
Your statement is prone to failures. You read this as:
read all rows with name = 'width'
of these rows cast all values to integer and only keep those with a value graeter than 12
But the DBMS is free to check conditions in the WHERE clause in any order. If the DBMS does that:
cast all values to integer and only keep the rows with a value graeter than 12
of these rows keep all with name = 'width'
the first step will already cause a runtime error, if there is a non-integer value in that table, which is likely.
So first get your query safe. The following should work:
select *
from
(
select *
from attribute
where name = 'width'
) widths
where cast(value as integer) > 12;
This will still fail, when your width contains non-integers. So, to get this even safe in case of invalid data in the table, you may want to add a check that the value only contains digits in the subquery.
And yes, this won't become super-fast. You sacrifice speed (and data consistency checks) for flexibility with this data model.
What you can do, however, is create an index on both columns, so the DBMS can quickly find all width rows and then have the value directly at hand, before it accesses the table:
create index idx on attribute (name, value);
As far as I know, there is no fail-safe cast function in PostgreSQL. Otherwise you could use this and have a function index instead. I may be wrong, so maybe someone can come up with a better solution here.

Cast to INT Fails

I executed a query very similar to this:
SELECT
CAST(A.ValueCol AS INT) * CAST(B.ValueCol AS INT) MultipliedValue
FROM Table1 A
JOIN Table1 B
ON A.LinkCol = B.LinkCol
WHERE
A.TypeCol = <Value that limits ValueCol to only integers>
AND B.TypeCol = <Value that limits ValueCol to only integers>
In this case, ValueCol is type NVARCHAR and contains both integer and non-integer values. Despite adequate filtering in the WHERE clause, I'm getting CAST errors for values that aren't even scoped (e.g. if WHERE filters all non-integer values, SQL is throwing an error trying to cast 'ABC', which does exist in a table row but should not have been pulled into this query). I verified that only integer values were being pulled by removing the CASTs and selecting the two ValueCols independently.
Is there a precedence/order of operations problem here? Is CAST applied to all rows' ValueCols prior to filtering with WHERE?
I know I can use TRY_CAST, just curious about this behavior in SQL. Thank you!
Sometimes, SQL Server will deem it more efficient to run an operation like CAST() for every row in a page (or index), before applying filters from the WHERE clause.
There is no good way to avoid this.
This is one reason why you should not store meaningful data in columns with an ambiguous type, with anti-patterns such as EAV.

PostgreSQL - How to cast dynamically?

I have a column that has the type of the dataset in text.
So I want to do something like this:
SELECT CAST ('100' AS %INTEGER%);
SELECT CAST (100 AS %TEXT%);
SELECT CAST ('100' AS (SELECT type FROM dataset_types WHERE id = 2));
Is that possible with PostgreSQL?
SQL is strongly typed and static. Postgres demands to know the number of columns and their data type a the time of the call. So you need dynamic SQL in one of the procedural language extensions for this. And then you still face the obstacle that functions (necessarily) have a fixed return type. Related:
Dynamically define returning row types based on a passed given table in plpgsql?
Function to return dynamic set of columns for given table
Or you go with a two-step flow. First concatenate the query string (with another SELECT query). Then execute the generated query string. Two round trips to the server.
SELECT '100::' || type FROM dataset_types WHERE id = 2; -- record resulting string
Execute the result. (And make sure you didn't open any vectors for SQL injection!)
About the short cast syntax:
Postgres data type cast

Conditional casting of column datatype

i have subquery, that returns me varchar column, in some cases this column contains only numeric values and in this cases i need to cast this column to bigint, i`ve trying to use CAST(case...) construction, but CASE is an expression that returns a single result and regardless of the path it always needs to result in the same data type (or implicitly convertible to the same data type). Is there any tricky way to change column datatype depending on condition in PostgreSQL or not? google cant help me((
SELECT
prefix,
module,
postfix,
id,
created_date
FROM
(SELECT
s."prefix",
coalesce(m."replica", to_char(CAST((m."id_type" * 10 ^ 12) AS bigint) + m."id", 'FM0000000000000000')) "module",
s."postfix",
s."id",
s."created_date"
FROM some_subquery
There is really no way to do what you want.
A SQL query returns a fixed set of columns, with the names and types being fixed. So, a priori what you want to do does not fit well within SQL.
You could work around this, by inventing your own type, that is either a big integer or a string. You could store the value as JSON. But those are work-arounds. The SQL query itself is really returning one "type" for each column; that is how SQL works.