Using EXCEPT where 1=0 - sql

I saw the following posted on a basic way to de-dup entries, without explanation of how it works. I see that it works, but I want to know the workings of how it works and the process in which it evaluates. Below I will post the code, and my thoughts. I am hoping that somebody can tell me if my thought process on how this is evaluated step by step is correct, or if I am off, can somebody please break it down for me.
CREATE TABLE #DuplicateRcordTable (Col1 INT, Col2 INT)
INSERT INTO #DuplicateRcordTable
SELECT 1, 1
UNION ALL
SELECT 1, 1
UNION ALL
SELECT 1, 1
UNION ALL
SELECT 1, 2
UNION ALL
SELECT 1, 2
UNION ALL
SELECT 1, 3
UNION ALL
SELECT 1, 4
GO
This returns a basic table:
Then this code is used to exclude duplicates:
SELECT col1,col2
FROM #DuplicateRcordTable
EXCEPT
SELECT col1,col2
FROM #DuplicateRcordTable WHERE 1=0
My understanding is that where 1=0 creates a "temp" table structured the same but has no data.
Does this code then start adding data to the new empty table?
For example does it look at the first Col1, Col2 pair of 1,1 and say "I don't see it in the table" so it adds it to the "temp" table and end result, then checks the next row which is also 1,1 and then sees it already in the "temp" table so its not added to the end result....and so on through the data.

EXCEPT is a set operation that removes duplicates. That is, it takes everything in the first table that is not in the second and then does duplicate removal.
With an empty second set, all that is left is the duplicate removal.
Hence,
SELECT col1, col2
FROM #DuplicateRcordTable
EXCEPT
SELECT col1, col2
FROM #DuplicateRcordTable
WHERE 1 = 0;
is equivalent to:
SELECT DISTINCT col1, col2
FROM #DuplicateRcordTable
This would be the more typical way to write the query.
This would also be equivalent to:
SELECT col1,col2
FROM #DuplicateRcordTable
UNION
SELECT col1,col2
FROM #DuplicateRcordTable
WHERE 1 = 0;

The reason that this works is due to the definition of EXCEPT which according to the MS docs is
EXCEPT returns distinct rows from the left input query that aren't
output by the right input query.
The key word here being distinct. Putting where 1 = 0 makes the second query return no results, but the EXCEPT operator itself then reduces the rows from the left query down to those which are distinct.
As #Gordon Linoff says in his answer, there is a simpler, more straightforward way to accomplish this.
The fact that the example uses the same table in the left and right queries could be misleading, the following query will accomplish the same thing, so long as the values in the right query don't exist in the left:
SELECT col1, col2
FROM #DuplicateRecordTable
EXCEPT
SELECT -1, -1
REF: https://learn.microsoft.com/en-us/sql/t-sql/language-elements/set-operators-except-and-intersect-transact-sql?view=sql-server-2017

Related

SQL query to find columns having at least one non null value

I am developing a data validation framework where I have this requirement of checking that the table fields should have at least one non-null value i.e they shouldn't be completely empty having all values as null.
For a particular column, I can easily check using
select count(distinct column_name) from table_name;
If it's greater than 0 I can tell that the column is not empty. I already have a list of columns. So, I can execute this query in the loop for every column but this would mean a lot of requests and it is not the ideal way.
What is the better way of doing this? I am using Microsoft SQL Server.
I would not recommend using count(distinct) because it incurs overhead for removing duplicate values. You can just use count().
You can construct the query for counts using a query like this:
select count(col1) as col1_cnt, count(col2) as col2_cnt, . . .
from t;
If you have a list of columns you can do this as dynamic SQL. Something like this:
declare #sql nvarchar(max);
select #sql = concat('select ',
string_agg(concat('count(', quotename(s.value), ') as cnt_', s.value),
' from t'
)
from string_split(#list) s;
exec sp_executesql(#sql);
This might not quite work if your columns have special characters in them, but it illustrates the idea.
You should probably use exists since you aren't really needing a count of anything.
You don't indicate how you want to consume the results of multiple counts, however one thing you could do is use concat to return a list of the columns meeting your criteria:
The following sample table has 5 columns, 3 of which have a value on at least 1 row.
create table t (col1 int, col2 int, col3 int, col4 int, col5 int)
insert into t select null,null,null,null,null
insert into t select null,2,null,null,null
insert into t select null,null,null,null,5
insert into t select null,null,null,null,6
insert into t select null,4,null,null,null
insert into t select null,6,7,null,null
You can name the result of each case expression and concatenate, only the columns that have a non-null value are included as concat ignores nulls returned by the case expressions.
select Concat_ws(', ',
case when exists (select * from t where col1 is not null) then 'col1' end,
case when exists (select * from t where col2 is not null) then 'col2' end,
case when exists (select * from t where col3 is not null) then 'col3' end,
case when exists (select * from t where col4 is not null) then 'col4' end,
case when exists (select * from t where col5 is not null) then 'col5' end)
Result:
col2, col3, col5
I asked a similar question about a decade ago. The best way of doing this in my opinion would meet the following criteria.
Combine the requests for multiple columns together so they can all be calculated in a single scan.
If the scan encounters a not null value in every column under consideration allow it to exit early without reading the rest of the table/index as reading subsequent rows won't change the result.
This is quite a difficult combination to get in practice.
The following might give you the desired behaviour
SELECT DISTINCT TOP 2 ColumnWithoutNull
FROM YourTable
CROSS APPLY (VALUES(CASE WHEN b IS NOT NULL THEN 'b' END),
(CASE WHEN c IS NOT NULL THEN 'c' END)) V(ColumnWithoutNull)
WHERE ColumnWithoutNull IS NOT NULL
OPTION ( HASH GROUP, MAXDOP 1, FAST 1)
If it gives you a plan like this
Hash match usually reads all its build input first meaning that no shortcircuiting of the scan will happen. If the optimiser gives you an operator in "flow distinct" mode it won't do this however and the query execution can potentially stop as soon as TOP receives its first two rows signalling that a NOT NULL value has been found in both columns and query execution can stop.
But there is no hint to request the mode for hash aggregate so you are dependent on the whims of the optimiser as to whether you will get this in practice. The various hints I have added to the query above are an attempt to point it in that direction however.

How to duplicate records, modify and add them to same table

I got some question and hopefully you can help me out. :)
What I have is a table like this:
ID Col1 Col2 ReverseID
1 Number 1 Number A
2 Number 2 Number B
3 Number 3 Number C
What I want to achieve is:
Create duplicate of every record with switched columns and add them to original table
Add the ID of the duplicate to ReverseID column of original record and vice-versa
So the new table should look like:
ID Col1 Col2 ReverseID
1 Number 1 Number A 4
2 Number 2 Number B 5
3 Number 3 Number C 6
4 Number A Number 1 1
5 Number B Number 2 2
6 Number C Number 3 3
What I've done so far was working with temporary table:
SELECT * INTO #tbl
FROM myTable
UPDATE #tbl
SET Col1 = Col2,
Col2 = Col1,
ReverseID = ID
INSERT INTO DUPLICATEtable(
Col1,
Col2,
ReverseID
)
SELECT Col1,
Col2,
ReverseID
FROM #tbl
In this example code I used a secondary table just for making sure I do not compromise the original data records.
I think I could skip the SET-part and change the columns in the last SELECT statement to achieve the same, but I am not sure.
Anyway - with this I am ending up at:
ID Col1 Col2 ReverseID
1 Number 1 Number A
2 Number 2 Number B
3 Number 3 Number C
4 Number A Number 1 1
5 Number B Number 2 2
6 Number C Number 3 3
So the question remains: How do I get the ReverseIDs correctly added to original records?
As my SQL knowledge is pretty low I am almost sure, this is not the simplest way of doing things, so I hope you guys & girls can enlighten me and lead me to a more elegant solution.
Thank you in advance!
br
mrt
Edit:
I try to illustrate my initial problem, so this posting gets long. ;)
.
First of all: My frontend does not allow any SQL statements, I have to focus on classes, attributes, relations.
First root cause:
Instances of a class B (B1, B2, B3, ...) are linked together in class Relation, these are many-to-many relations of same class. My frontend does not allow join tables, so that's a workaround.
Stating a user adds a relation with B1 as first side (I just called it 'left') and B2 as second side (right):
Navigating from B1, there will be two relations showing up (FK_Left, FK_Right), but only one of them will contain a value (let's say FK_Left).
Navigating from B2, the value will be only listed in the other relation (FK_Right).
So from the users side, there are always two relations displayed, but it depends on how the record was entered, if one can find the data behind relation_left or relation_right.
That's no practicable usability.
If I had all records with vice-versa partners, I can just hide one of the relations and the user sees all information behind one relation, regardless how it was entered.
Second root cause:
The frontend provides some matrix view, which gets the relation class as input and displays left partners in columns and right partners in rows.
Let's say I want to see all instances of A in columns and their partners in rows, this is only possible, if all relations regarding the instances of A are entered the same way, e.g. all A-instances as left partner.
The matrix view shall be freely filterable regarding rows and columns, so if I had duplicate relations, I can filter on any of the partners in rows and columns.
sorry for the long text, I hope that made my situation a bit clearer.
I would suggest just using a view instead of trying to create and maintain two copies of the same data. Then you just select from the view instead of the base table.
create view MyReversedDataView as
select ID
, Col1
, Col2
from MyTable
UNION ALL
select ID
, Col2
, Col1
from MyTable
The trick to this kind of thing is to start with a SELECT that gets the data you need. In this case you need a resultset with Col1, Col2, reverseid.
SELECT Col2 Col1, Col1 Col1, ID reverseid
INTO #tmp FROM myTable;
Convince yourself it's correct -- swapped column values etc.
Then do this:
INSERT INTO myTable (Col1, col2, reverseid)
SELECT Col1, Col2, reverseid FROM #tmp;
If you're doing this from a GUI like ssms, don't forget to DROP TABLE #tmp;
BUT, you can get the same result with a pure query, without duplicating rows. Why do it this way?
You save the wasted space for the reversed rows.
You always get the reversed rows up to the last second, even if you forget to run the process for reversing and inserting them into the table.
There's no consistency problem if you insert or delete rows from the table.
Here's how you might do this.
SELECT Col1, Col2, null reverseid FROM myTable
UNION ALL
SELECT Col2 Col1, Col1 Col2, ID reverseid FROM myTable;
You can even make it into a view and use it as if it were a table going forward.
CREATE VIEW myTableWithReversals AS
SELECT Col1, Col2, null reverseid FROM myTable
UNION ALL
SELECT Col2 Col1, Col1 Col2, ID reverseid FROM myTable;
Then you can say SELECT * FROM myTableWithReversals WHERE Col1 = 'value' etc.
Let me assume that the id column is auto-incremented. Then, you can do this in two steps:
insert into myTable (Col1, Col2, reverseid)
select col2, col1, id
from myTable t
order by id; -- ensures that they go in in the right order
This inserts the new ids with the right reverseid. Now we have to update the previous values:
update t
set reverseid = tr.id
from myTable t join
myTable tr
on tr.reverseid = t.id;
Note that no temporary tables are needed.

Show Oracle Apex Select List by Default

I have a tabular form in which I need to generate a dynamic amount of select lists based on the number of values in COL1 that are relevant to the query.
APEX_ITEM.SELECT_LIST_FROM_QUERY_XL(5, COL1, 'query...',p_show_null=>'NO') "COL1"
This works fine when the query returns at least one row. It creates x amount of select lists where x is the number of rows returned by the query. However, when no rows are returned, no select lists are created. How can I make it generate one select list when the query returns no results?
You could do something like this:
select ...,
APEX_ITEM.SELECT_LIST_FROM_QUERY_XL(5, COL1, 'query...',p_show_null=>'NO') "COL1"
from ...
where ...
union all
select ...,
APEX_ITEM.SELECT_LIST_FROM_QUERY_XL(5, 'xxx', 'query...',p_show_null=>'NO') "COL1"
from dual
where not exists (select null from <first query>)

How to define destination for an append query Microsoft Access

I'm trying to append two tables in MS Access at the moment. This is my SQL View of my Query at the moment:
INSERT INTO MainTable
SELECT
FROM Table1 INNER JOIN Table2 ON Table1.University = Table2.University;
Where "University" is the only field name that would have similarities between the two tables. When I try and run the query, I get this error:
Query must have at least one destination field.
I assumed that the INSERT INTO MainTable portion of my SQL was defining the destination, but apparently I am wrong. How can I specify my destination?
You must select something from your select statement.
INSERT INTO MainTable
SELECT col1, col2
FROM Table1 INNER JOIN Table2 ON Table1.University = Table2.University;
Besides Luke Ford's answer (which is correct), there's another gotcha to consider:
MS Access (at least Access 2000, where I just tested it) seems to match the columns by name.
In other words, when you execute the query from Luke's answer:
INSERT INTO MainTable
SELECT col1, col2
FROM ...
...MS Access assumes that MainTable has two columns named col1 and col2, and tries to insert col1 from your query into col1 in MainTable, and so on.
If the column names in MainTable are different, you need to specify them in the INSERT clause.
Let's say the columns in MainTable are named foo and bar, then the query needs to look like this:
INSERT INTO MainTable (foo, bar)
SELECT col1, col2
FROM ...
As other users have mentioned, your SELECT statement is empty. If you'd like to select more that just col1, col2, however, that is possible. If you want to select all columns in your two tables that are to be appended, you can use SELECT *, which will select everything in the tables.

Optionally use a UNION from another table in T-SQL without using temporary tables or dynamic sql?

I have two sql server tables with the same structure. In a stored procedure I have a Select from the first table. Occasionally I want to select from the second table as well based on a passed in parameter.
I would like a way to do this without resorting to using dynamic sql or temporary tables.
Pass in param = 1 to union, anything else to only return the first result set:
select field1, field2, ... from table1 where cond
union
select field1, field2, ... from table2 where cond AND param = 1
If they are both the exact same structure, then why not have a single table with a parameter that differentiates the two tables? At that point, it becomes a simple matter of a case statement on the parameter on which results set you receive back.
A second alternative is dual result sets. You can select multiple result sets out of a stored procedure. Then in code, you would either use DataReader.NextResult or DataSet.Tables(1) to get at the second set of data. It will then be your code's responsibility to place them into the same collection or merge the two tables.
A THIRD possibility is to utilize an IF Statement. Say, pass in an integer with the expected possible values of 1,2, 3 and then have something along this in your actual stored procedure code
if #Param = 1 Then
Select From Table1
if #Param = 2 THEN
Select From Table2
if #Param = 3 Then
Select From Table1 Union Select From Table 2
A fourth possibility would be to have two distinct procedures one which runs a union and one which doesn't and then make it your code's responsibility to determine which one to call based on that parameter, something like:
myCommandObject.CommandText = IIf(myParamVariable = true, "StoredProc1", StoredProc2")
It's pretty easy.
/* Always return tableX */
select colA, colB
from tableX
union
select colA, colB
from tableY
where #parameter = 'IncludeTableY' /* Will union with an empty set otherwise */
If this isn't immediately apparent (it often isn't), consider the examples below. The primary thing to remember is that the if the where clause evaluates to true for a row, it is returned otherwise it's discarded.
This always evaluates to true so every row is returned.
select *
from tableX
where 1 = 1
This always evaluates to false so no rows are returned (sometimes used as a quick and dirty get-me-the-columns query).
select *
from tableX
where 1 = 0
this will return values from either table, depending on if you passed a value on the parameter
select field1, field2, ... from table1 where #p1 is null
union
select field1, field2, ... from table2 where #p1 is not null
you just need to add the rest of your criteria for the where clause
Use a view.
CREATE view_both
AS
SELECT *, 1 AS source
FROM table1
UNION ALL
SELECT *, 2 AS source
FROM table2
SELECT * FROM view_both WHERE source < #source_flag
The optimizer determines which, or both, tables to use based on source without requiring it to be indexed.