SQL UNION - what are the implicit matching conditions?

SQL UNION - what are the implicit matching conditions? - sql

If I perform an SQL UNION on 2 tables, what are the explicit conditions that UNION uses to decide that any 2 rows are the same?
Looking through the Postgres documentation, it just states "A row is in the set union of two result sets if [it] appears in at least one of the result sets". If [WHAT] appears in the result sets?
Does it match column1, then column2, ...? How does it decide to stop at columnX?
I'm a little stunned that the actual matching rules aren't spelled out explicitly. Or, I just couldn't find them with 10 minutes of Google-ing.
Thanks for your help.

UNION (as compared to UNION ALL) is similar to DISTINCT in that for duplicates to be removed, all columns have to be identical.

I'm not sure what you are asking, but perhaps this is it.
When you use union (or other set operators) on two tables, the rows are compared by position. All columns must match -- that is, there need to be the same number of columns in both tables. Postgres will decide on the type of each column in the result set based on type precedence rules. The columns in either table will be converted to the specified type.
Does that address what you are asking?
Actually, this is pretty much specified in the documentation:
SQL UNION constructs must match up possibly dissimilar types to become
a single result set. The resolution algorithm is applied separately to
each output column of a union query. The INTERSECT and EXCEPT
constructs resolve dissimilar types in the same way as UNION. The
CASE, ARRAY, VALUES, GREATEST and LEAST constructs use the identical
algorithm to match up their component expressions and select a result
data type.
Type Resolution for UNION, CASE, and Related Constructs
If all inputs are of the same type, and it is not unknown, resolve as
that type.
If any input is of a domain type, treat it as being of the domain's
base type for all subsequent steps. [9]
If all inputs are of type unknown, resolve as type text (the preferred
type of the string category). Otherwise, unknown inputs are ignored
for the purposes of the remaining rules.
If the non-unknown inputs are not all of the same type category, fail.
Choose the first non-unknown input type which is a preferred type in
that category, if there is one.
Otherwise, choose the last non-unknown input type that allows all the
preceding non-unknown inputs to be implicitly converted to it. (There
always is such a type, since at least the first type in the list must
satisfy this condition.)
Convert all inputs to the selected type. Fail if there is not a
conversion from a given input to the selected type.

Related

I have an issue trying to UNION All in SQL Server 2008

I am having to create a second header line and am using the first record of the Query to do this. I am using a UNION All to create this header record and the second part of the UNION to extract the Data required.
I have one issue on one column.
,'Active Energy kWh'
UNION ALL
,SUM(cast(invc.UNITS as Decimal (15,0)))
Each side are 11 lines before and after the Union and I have tried all sorts of combinations but it always results in an error message.
The above gives me "Error converting data type varchar to numeric."
Any help would be much appreciated.

The error message indicates that one of your values in the INVC table UNITS column is non-numeric. I would hazard a guess that it's either a string (VARCHAR or similar) column or something else - and one of the values has ended up in a state where it cannot be parsed.
Unfortunately there is no way other than checking small ranges of the table to gradually locate the 'bad' row (i.e. Try running the query for a few million rows at a time, then reducing the number until you home in on the bad data). SQL 2014 if you can get a database restored to it has the TRY_CONVERT function which will permit conversions to fail, enabling a more direct check - but you'll need to play with this on another system
(I'm assuming that an upgrade to 2014 for this feature is out of the question - your best bet is likely just looking for the bad row).

The problem is that you are trying to mix header information with data information in a single query.
Obviously, all your header columns will be strings. But not all your data columns will be strings, and SQL Server is unhappy when you mix data types this way.
What you are doing is equivalent to this:
select 'header1' as col1 -- string
union all
select 123.5 -- decimal
The above query produces the following error:
Error converting data type varchar to numeric.
...which makes sense, because you are trying to mix both a string (the header) with a decimal field.
So you have 2 options:
Remove the header columns from your query, and deal with header information outside your query.
Accept the fact that you'll need to convert the data type of every column to a string type. So when you have numeric data, you'll need to cast the column to varchar(n) explicitly.
In your case, it would mean adding the cast like this:
,'Active Energy kWh'
UNION ALL
,CAST(SUM(cast(invc.UNITS as Decimal (15,0))) AS VARCHAR(50)) -- Change 50 to appropriate value for your case
EDIT: Based on comment feedback, changed the cast to varchar to have an explicit length (varchar(n)) to avoid relying on the default length, which may or may not be long enough. OP knows the data, so OP needs to pick the right length.

SQL Server - simple select and conversion between int and string

I have a simple select statement like this:
SELECT [dok__Dokument].[dok_Id],
[dok__Dokument].[dok_WartUsNetto],
[dok__Dokument].[dok_WartUsBrutto],
[dok__Dokument].[dok_WartTwNetto],
[dok__Dokument].[dok_WartTwBrutto],
[dok__Dokument].[dok_WartNetto],
[dok__Dokument].[dok_WartVat],
[dok__Dokument].[dok_WartBrutto],
[dok__Dokument].[dok_KwWartosc]
FROM [dok__Dokument]
WHERE [dok_NrPelnyOryg] = 2753
AND [dok_PlatnikId] = 174
AND [dok_OdbiorcaId] = 174
AND [dok_PlatnikAdreshId] = 625
AND [dok_OdbiorcaAdreshId] = 624
Column dok_NrPelnyOryg is of type varchar(30), and not null.
The table contained both integer and string values in this column and this select statement was fired millions of times.
However recently this started crashing with message:
Conversion failed when converting the varchar value 'garbi czerwiec B' to data type int.
Little explanation: the table contains multiple "document" records and the mentioned column contains document original number (which comes from multiple different sources).
I know I can fix this by adding '' around the the number, but I'm rather looking for an explanation why this used to work and while not changing anything now it crashes.

It's possible that a plan change (due to changed statistics, recompile etc) led to this data being evaluated earlier (full scan for example), or that this particular data was not in the table previously (maybe before this started happening, there wasn't bad data in there). If it is supposed to be a number, then make it a numeric column. If it needs to allow strings as well, then stop treating it like a number. If you properly parameterize your statements and always pass a varchar you shouldn't need to worry about whether the value is enclosed in single quotes.

All those equality comparison operations are subject to the Data Type Precedence rules of SQL Server:
When an operator combines two
expressions of different data types,
the rules for data type precedence
specify that the data type with the
lower precedence is converted to the
data type with the higher precedence.
Since character types have lower precedence than int types, the query is basically the same as:
SELECT ...
FROM [dok__Dokument]
WHERE cast([dok_NrPelnyOryg] as int) = 2753
...
This has two effects:
it makes all indexes on columns involved in the WHERE clause useless
it can cause conversion errors.
You're not the first to have this problem, in fact several CSS cases I faced had me eventually write an article about this: On SQL Server boolean operator short-circuit.
The correct solution to your problem is that if the field value is numeric then the column type should be numeric. since you say that the data come from a 3rd party application you cannot change, the best solution is to abandon the vendor of this application and pick one that knows what is doing. Short of that, you need to search for character types on character columns:
SELECT ...
FROM [dok__Dokument]
WHERE [dok_NrPelnyOryg] = '2753'
...
In .Net managed ADO.Net parlance this means you use a SqlCommand like follows:
SqlCommand cmd = new SqlCommand (#" SELECT ...
FROM [dok__Dokument]
WHERE [dok_NrPelnyOryg] = #nrPelnyOryg
... ");
cmd.Parameters.Add("#nrPelnyOryg", SqlDbType.Varchar).Value = "2754";
...
Just make sure you don't fall into he easy trap of passing in a NVARCHAR parameter (Unicode) for comparing with a VARCHAR column, since the same data type precendence rules quoted before will coerce the comparison to occur on the NVARCHAR type, thus rendering indexes, again, useless. the easiest way to fall for this trap is to use the dredded AddWithValue and pass in a string value.

Your query stopped working because someone inserted the text string in to the field you are querying using INT. Up until that time it was possible to implicitly convert the data but now that's no longer the case.
I'd go check your data and, more importantly, the model; as Aaron said do you need to allow strings in that field? If not, change the data type to prevent this happening in the future.

SQL Server 2008 - Default column value - should i use null or empty string?

For some time i'm debating if i should leave columns which i don't know if data will be passed in and set the value to empty string ('') or just allow null.
i would like to hear what is the recommended practice here.
if it makes a difference, i'm using c# as the consuming application.

I'm afraid that...
it depends!
There is no single answer to this question.
As indicated in other responses, at the level of SQL, NULL and empty string have very different semantics, the former indicating that the value is unknown, the latter indicating that the value is this "invisible thing" (in displays and report), but none the less it a "known value". A example commonly given in this context is that of the middle name. A null value in the "middle_name" column would indicate that we do not know whether the underlying person has a middle name or not, and if so what this name is, an empty string would indicate that we "know" that this person does not have a middle name.
This said, two other kinds of factors may help you choose between these options, for a given column.
The very semantics of the underlying data, at the level of the application.
Some considerations in the way SQL works with null values
Data semantics
For example it is important to know if the empty-string is a valid value for the underlying data. If that is the case, we may loose information if we also use empty string for "unknown info". Another consideration is whether some alternate value may be used in the case when we do not have info for the column; Maybe 'n/a' or 'unspecified' or 'tbd' are better values.
SQL behavior and utilities
Considering SQL behavior, the choice of using or not using NULL, may be driven by space consideration, by the desire to create a filtered index, or also by the convenience of the COALESCE() function (which can be emulated with CASE statements, but in a more verbose fashion). Another consideration is whether any query may attempt to query multiple columns to append them (as in SELECT name + ', ' + middle_name AS LongName etc.).
Beyond the validity of the choice of NULL vs. empty string, in given situation, a general consideration it to try and be as consistent as possible, i.e. to try and stick to ONE particular way, and to only/purposely/explicitly depart from this way for good reasons and in few cases.

Don't use empty string if there is no value. If you need to know if a value is unknown, have a flag for it. But 9 times out of 10, if the information is not provided, it's unknown, and that's fine.

NULL means unknown value. An empty string means a known value - a string with length zero. These are totally different things.

empty when I want a valid default value that may or may not be changed, for example, a user's middle name.
NULL when it is an error if the ensuing code does not set the value explicitly.
However, By initializing strings with the Empty value instead of null, you can reduce the chances of a NullReferenceException occurring.

Theory aside, I tend to view:
Empty string as a known value
NULL as unknown
In this case, I'd probably use NULL.
One important thing is to be consistent: mixing NULLs and empty strings will end in tears.
On a practical implementation level, empty string takes 2 bytes in SQL Server where as NULLs are bitmapped. In some conditions and for wide/larger tables it makes a different in performance because it's more data to shift around.

How to resolve Oracle error ORA-01790?

I have two select statements joined by "union". While executing that statement I've got:
Error report:
SQL Error: ORA-01790: expression must have same datatype as corresponding expression
01790. 00000 - "expression must have same datatype as corresponding expression"
Maybe you can give me an advise on how to diagnose this problem?

Without looking at your SQL, I would guess that you have columns being UNION'ed that have different data types.

Here's what found:
ORA-01790: expression must have same datatype as corresponding expression
Cause: A SELECT list item corresponds to a SELECT list item with a different datatype in another query of the same set expression.
Action: Check that all corresponding SELECT list items have the same datatypes. Use the TO_NUMBER, TO_CHAR, and TO_DATE functions to do explicit data conversions.
I haven't seen your query, but I am guessing that one select in your union is not selecting the same columns as the other.

Clearly the issue for the poster was solved over half a decade ago, nonetheless I wanted to point out to anyone reading this post in search of help that the order of the selected properties (columns) must match from one unioned statement to the next. It is not enough to simply have the names and the data types match, though that is in a sense the root cause. But due to the way the Union statements are handled in Oracle, it is possible to get the ORA-01790 error due to a mismatch in the ordering of columns only.
In my case, I had a query with a UNION ALL of two selects. One select had a column named "generic_column_name" as the 25th item in the select, and the other select had that same column named "generic_column_name" of the very same data type (I tested several ways through hard coding and also using forced data type conversions). However the second select had this item in the 19th place, so all of the columns from there on were offset and this triggered the ORA-01790 error.

As I mention in the question I'd like to have SUGGESTIONS for how to troubleshoot my problem.
What I've done is enabled one column at a time in each select statement and found that I had mismatch at the very last column of my SQL UNION. Thanks a lot for participating and helping me, but I knew I had type mismatch, WHAT I didn't know is how to troubleshoot.

The error is telling you that you're union-ing columns with different datatypes. There are oracle functions that will convert one type to another (e.g. "to_char"), you'll have to convert the datatypes into a common format, or at least one into the other. If you post the actual query/types it'd be possible to be more specifc.

You need to make sure that corresponding columns in your union have the same data type. The easiest way would be to comment out columns one by one to narrow down to the column and then use explicit type conversion function in one of them to make types match.

You tried to execute a SELECT statement (probably a UNION or a UNION ALL), and all of the queries did not contain matching data types in the result columns.
Techonthenet - ORA-01790

I had the same problem today. The issue is in the union, basically in the order of the columns

MySQL command to search CSV (or similar array)

I'm trying to write an SQL query that would search within a CSV (or similar) array in a column. Here's an example:
insert into properties set
bedrooms = 1,2,3 (or 1-3)
title = nice property
price = 500
I'd like to then search where bedrooms = 2+. Is this even possible?

The correct way to handle this in SQL is to add another table for a multi-valued property. It's against the relational model to store multiple discrete values in a single column. Since it's intended to be a no-no, there's little support for it in the SQL language.
The only workaround for finding a given value in a comma-separated list is to use regular expressions, which are in general ugly and slow. You have to deal with edge cases like when a value may or may not be at the start or end of the string, as well as next to a comma.
SELECT * FROM properties WHERE bedrooms RLIKE '[[:<:]]2[[:>:]]';
There are other types of queries that are easy when you have a normalized table, but hard with the comma-separated list. The example you give, of searching for a value that is equal to or greater than the search criteria, is one such case. Also consider:
How do I delete one element from a comma-separated list?
How do I ensure the list is in sorted order?
What is the average number of rooms?
How do I ensure the values in the list are even valid entries? E.g. what's to prevent me from entering "1,2,banana"?
If you don't want to create a second table, then come up with a way to represent your data with a single value.
More accurately, I should say I recommend that you represent your data with a single value per column, and Mike Atlas' solution accomplishes that.

Generally, this isn't how you should be storing data in a relational database.
Perhaps you should have a MinBedroom and MaxBedroom column. Eg:
SELECT * FROM properties WHERE MinBedroom > 1 AND MaxBedroom < 3;

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas