What happened when you use table1.* <> table2.*?

What happened when you use table1.* <> table2.*? - sql

Imagine 2 table with the same structure but that might have some rows not equal for the same primary key.
What really happen when using a where clause like this : " where table1.* <> table2.* " ?
I "used" it in PostgreSQL but I'm interested for other's languages behavior with this weird thing.

This statement is comparing, every column together of the first table to every column together of the second table. It is the same as writing the composite type explicitly, which would be required if the columns are not in the same order in both tables.
(t1.id, t1.col1, t1.col2) <> (t2.id, t2.col2, t2.col2)
or even more verbose
t1.id <> t2.id
OR t1.col1 <> t2.col1
OR t1.col2 <> t2.col2
But you may want to use IS DISTINCT FROM instead of <> to consider null vs not null as being different/not equal.

In postgres t1.* <> t2.* in this context is expanded to be:
(t1.c1, t1.c2, ..., t1.cn) <> (t2.c1, t2.c2, ..., t2.cn)
which is the same as:
(t1.c1 <> t2.c1) OR (t1.c2 <> t2.c2) OR ...
I think the expansion is a postgres extension to the standard, tuple comparision exists in several other DBMS. You can read about it at https://www.postgresql.org/docs/current/sql-expressions.html#SQL-SYNTAX-ROW-CONSTRUCTORS
The number of columns is required to be the same when comparing tuples, but
I discovered something peculiar when trying your example:
create table ta (a1 int);
create table tb (b1 int, y int);
select * from ta cross join tb where ta.* <> tb.*
The last select succeds, despite the tuples having different number of columns. Adding some rows change that:
insert into ta values (1),(2);
insert into tb values (2,1),(3,4);
select * from ta cross join tb where ta.* <> tb.*
ERROR: cannot compare record types with different numbers of columns
so it appears as this is not checked when the statement is prepared. Expanding the tuple manually yields an ERROR even with empty tables:
select * from ta cross join tb where (ta.a1, ta.a1) <> (tb.b1, y, y);
ERROR: unequal number of entries in row expressions
Fiddle

Related

LEFT JOIN with subquery and accessing main table columns in select clause

I have an insert statement like the following which gets sytax error of "the multi-part identifier "t2.Col1" could not be bound.". I over simplified the statement and it looks like below:
INSERT INTO dbo.T1
(
Col1,
Col2,
Col3
)
SELECT
t2.Col1,
SUBSTRING(aCase.CaseColumn, 0, CHARINDEX('%', aCase.CaseColumn)), --I expect this line gets the value "2"
SUBSTRING(aCase.CaseColumn, CHARINDEX('%', aCase.CaseColumn) + 1, LEN(aCase.CaseColumn) - CHARINDEX('%', aCase.CaseColumn)) --I expect this line gets the value "3"
FROM
dbo.T2 t2
LEFT JOIN
(
SELECT
CASE --I have hundreds of WHEN conditions below and need to access the parent T2 tables' properties
WHEN t2.Col1 = 1 THEN '2%3' --This line has a syntax error of "the multi-part identifier "t2.Col1" could not be bound."
END AS CaseColumn
)
AS aCase ON 1 = 1
The reason I use LEFT JOIN with CASE is that I have hundreds of conditions for which I need to select different values for different columns. I don't want to repeat the same CASE statements over and over again for all of the columns. Therefore, I use a single CASE which concatenates the values with a delimiter and then I parse that concatenated string and put the appropriate values in it's place.

What you could do is use OUTER APPLY, as it allows your dbo.T2 and the aCase resultset to be related, like this:
INSERT INTO dbo.T1
(
Col1,
Col2,
Col3
)
SELECT
1,
SUBSTRING(aCase.CaseColumn, 0, CHARINDEX('%', aCase.CaseColumn)), --I expect this line gets the value "2"
SUBSTRING(aCase.CaseColumn, CHARINDEX('%', aCase.CaseColumn) + 1, LEN(aCase.CaseColumn) - CHARINDEX('%', aCase.CaseColumn)) --I expect this line gets the value "3"
FROM
dbo.T2 t2
OUTER APPLY
(
SELECT
CASE --I have hundreds of WHEN conditions below and need to access the parent T2 tables' properties
WHEN t2.Col1 = 1 THEN '2%3'
END AS CaseColumn
)
AS aCase ON 1 = 1
That is because the result of the subquery is not indipendent itself, it has to be defined based on the values of the dbo.T2 table.
Read more about OUTER APPLY and CROSS APPLY on this thread.
Number 3, "Reusing a table alias" is similiar to your case and the article linked to it perfectly explains how to use cross apply/outer apply in these cases.

When using a join to a subquery, inside that subquery it doesn't know what t2 is, unless you select from a table aliased as t2 in that subquery.
And you could change that LEFT JOIN to an OUTER APPLY.
But you don't really need to JOIN or OUTER APPLY in this case.
Just select from T2 with the CASE in the subquery.
INSERT INTO dbo.T1
(
Col1,
Col2,
Col3
)
SELECT
Col1,
SUBSTRING(CaseColumn, 1, CHARINDEX('%', CaseColumn)-1),
SUBSTRING(CaseColumn, CHARINDEX('%',CaseColumn)+1, LEN(CaseColumn))
FROM
(
SELECT
Col1,
CASE Col1
WHEN 1 THEN '2%3'
-- more when's
END AS CaseColumn
FROM dbo.T2 t2
) q
Note how the CASE and the SUBSTRING's were changed a little bit.
Btw, personally I would just insert the distinct Col1 into T1, and just update Col2 and Col3 in that reference table manually. That could prove to be faster than writing those hundreds conditions. But then again, you did say this was simplified a lot.

How to select a value that can come from two different tables?

First, SQL is not my strength. So I need help with the following problem. I'll simplify the table contents to describe the problem.
Let's start with three tables : table1 with columns id_1 and value, table2 with columns id_2 and value, and table3 with columns id_3 and value. As you'll notice, a field value appears in all three tables, while ids have different column names. Modifying column names is not an option because they are used by Java legacy code.
I need to set table3.value using table1.value or table2.value according to the fields table1.id_1, table2.id_2 and table3.id_3.
My last attempt, which describes what I try to do, is the following:
UPDATE table3
SET value=(IF ((SELECT COUNT(\*) FROM table1 t1 WHERE t1.id_1=id_3) > 0)
SELECT value FROM table1 t1 WHERE t1.id_1=id_3
ELSE IF ((SELECT COUNT(\*) FROM table2 t2 WHERE t2.id_2=id_3)) > 0)
SELECT value FROM table2 t2 WHERE t2.id_2=id_3)
Here are some informations about the tables and the update.
This update will be included in an XML file used by Liquibase.
It must work with Oracle or SQL Server.
An id from table3.id_3 can be found at most once in table1.id_1 or in table2.id_2, but not in both tables simultaneously.
If table3.id_3 is not found in table1.id_1 nor in table2.id_2, table3.value remains null.
As you can imagine, my last attempt failed. In that case, the IF command was not recognized during the Liquibase update. If anyone has any ideas how to deal with this, I'd appreciate. Thanks in advance.

I don't know Oracle very well, but a SQL Server approach would be the following using COALESCE() and OUTER JOINs.
Update T3
Set Value = Coalesce(T1.Value, T2.Value)
From Table3 T3
Left Join Table2 T2 On T3.Id_3 = T2.Id_2
Left Join Table1 T1 On T3.Id_3 = T1.Id_1
The COALESCE() will return the first non-NULL value from the LEFT JOIN to tables 1 and 2, and if a record was not found in either, it would be set to NULL.

It is Siyual's UPDATE written with MERGE operator.
MERGE into table_1
USING (
SELECT COALESCE(t2.value, t3.value) as value, t1.id_1 as id
FROM table_1 t1, table_2 t2, table_3 t3
WHERE t2.id_2 = t3.id_3 and t1.id_1 = t2.id_2
) t on (table_1.id_1 = t.id)
WHEN MATCHED THEN
UPDATE SET table_1.value = t.value
This should work in Oracle.

In Oracle
UPDATE table3 t
SET value=COALESCE((SELECT value FROM table1 t1 WHERE t1.id_1=t.id_3),
(SELECT value FROM table2 t2 WHERE t2.id_2=t.id_3))

Given your assumption #3, you can use union all to put together tables 1 and 2 without running the risk of duplicating information (at least for the id's of interest). So a simple merge solution like the one below should work (in all DB products that implement the merge operation).
merge into table3
using (
select id_2 as id, value from table2
union all
select id_3, value from table 3
) t
on table3.id_3 = t.id
when matched
then update set table3.value = t.value;
You may want to test the various solutions and see which is most effective for your specific tables.
(Note: merge should be more efficient than the update solution using coalesce, at least when relatively few of the id's in table3 have a match in the other tables. This is because the update solution will re-insert NULL where NULL was already stored when there is no match. The merge solution avoids this unnecessary activity.)

Delete from table A joining on table A in Redshift

I am trying to write the following MySQL query in PostgreSQL 8.0 (specifically, using Redshift):
DELETE t1 FROM table t1
LEFT JOIN table t2 ON (
t1.field = t2.field AND
t1.field2 = t2.field2
)
WHERE t1.field > 0
PostgreSQL 8.0 does not support DELETE FROM table USING. The examples in the docs say that you can reference columns in other tables in the where clause, but that doesn't work here as I'm joining on the same table I'm deleting from. The other example is a subselect query, but the primary key of the table I'm working with has four columns so I can't see a way to make that work either.

Amazon Redshift was forked from Postgres 8.0, but is a very much different beast. The manual informs, that the USING clause is supported in DELETE statements:
Just use the modern form:
DELETE FROM tbl
USING tbl t2
WHERE t2.field = tbl.field
AND t2.field2 = tbl.field2
AND t2.pkey <> tbl.pkey -- exclude self-join
AND tbl.field > 0;
This is assuming JOIN instead of LEFT JOIN in your MySQL statement, which would not make any sense. I also added the condition AND t2.pkey <> t1.pkey, to make it a useful query. This excludes rows joining itself. pkey being the primary key column.
What this query does:
Delete all rows where at least one other row exists in the same table with the same not-null values in field and field2. All such duplicates are deleted without leaving a single row per set.
To keep (for example) the row with the smallest pkey per set of duplicates, use t2.pkey < t2.pkey.
An EXISTS semi-join (as #wilplasser already hinted) might be a better choice, especially if multiple rows could be joined (a row can only be deleted once anyway):
DELETE FROM tbl
WHERE field > 0
AND EXISTS (
SELECT 1
FROM tbl t2
WHERE t2.field = tbl.field
AND t2.field2 = tbl.field2
AND t2.pkey <> tbl.pkey
);

I don't understand the mysql syntax, but you probably want this:
DELETE FROM mytablet1
WHERE t1.field > 0
-- don't need this self-join if {field,field2}
-- are a candidate key for mytable
-- (in that case, the exists-subquery would detect _exactly_ the
-- same tuples as the ones to be deleted, which always succeeds)
-- AND EXISTS (
-- SELECT *
-- FROM mytable t2
-- WHERE t1.field = t2.field
-- AND t1.field2 = t2.field2
-- )
;
Note: For testing purposes, you can replace the DELETE keyword by SELECT * or SELECT COUNT(*), and see which rows would be affected by the query.

T-SQL "Where not in" using two columns

I want to select all records from a table T1 where the values in columns A and B has no matching tuple for the columns C and D in table T2.
In mysql “Where not in” using two columns I can read how to accomplish that using the form select A,B from T1 where (A,B) not in (SELECT C,D from T2), but that fails in T-SQL for me resulting in "Incorrect syntax near ','.".
So how do I do this?

Use a correlated sub-query:
...
WHERE
NOT EXISTS (
SELECT * FROM SecondaryTable WHERE c = FirstTable.a AND d = FirstTable.b
)
Make sure there's a composite index on SecondaryTable over (c, d), unless that table does not contain many rows.

You can't do this using a WHERE IN type statement.
Instead you could LEFT JOIN to the target table (T2) and select where T2.ID is NULL.
For example
SELECT
T1.*
FROM
T1 LEFT OUTER JOIN T2
ON T1.A = T2.C AND T1.B = T2.D
WHERE
T2.PrimaryKey IS NULL
will only return rows from T1 that don't have a corresponding row in T2.

I Used it in Mysql because in Mysql there isn't "EXCLUDE" statement.
This code:
Concates fields C and D of table T2 into one new field to make it easier to compare the columns.
Concates the fields A and B of table T1 into one new field to make it easier to compare the columns.
Selects all records where the value of the new field of T1 is not equal to the value of the new field of T2.
SQL-Statement:
SELECT T1.* FROM T1
WHERE CONCAT(T1.A,'Seperator', T1.B) NOT IN
(SELECT CONCAT(T2.C,'Seperator', T2.D) FROM T2)

Here is an example of the answer that worked for me:
SELECT Count(1)
FROM LCSource as s
JOIN FileTransaction as t
ON s.TrackingNumber = t.TrackingNumber
WHERE NOT EXISTS (
SELECT * FROM LCSourceFileTransaction
WHERE [LCSourceID] = s.[LCSourceID] AND [FileTransactionID] = t.[FileTransactionID]
)
You see both columns exist in LCSourceFileTransaction, but one occurs in LCSource and one occurs in FileTransaction and LCSourceFileTransaction is a mapping table. I want to find all records where the combination of the two columns is not in the mapping table. This works great. Hope this helps someone.

Table "Diff" in Oracle

What is the best way to perform a "diff" between two structurally identical tables in Oracle? they're in two different schemas (visible to each other).
Thanks,

If you don't have a tool like PLSQL developer, you could full outer join the two tables. If they have a primary key, you can use that in the join. This will give you an instant view on records missing in either table.
Then, for the records that do exist in both tables, you can compare each of the fields. You should note that you cannot compare null with the regular = operator, so checking is table1.field1 = table2.field1 will return false if both fields are null. So you'll have to check for each field if it has the same value as in the other table, or if both are null.
Your query might look like this (to return records that don't match):
select
*
from
table1 t1
full outer join table2 t2 on t2.id = t1.id
where
-- Only one record exists
t1.id is null or t2.id is null or
( -- The not = takes care of one of the fields being null
not (t1.field1 = t2.field1) and
-- and they cannot both be null
(t1.field1 is not null or t2.field1 is not null)
)
You will have to copy that field1 condition for each of your fields. Of course you could write a function to compare field data to make your query easier, but you should keep in mind that that might decrease performance dramatically when you need to compare two large tables.
If your tables do not have a primary key, you will need to cross join them and perform these checks for each resulting record. You may speed that up a little by using full outer join on each mandatory field, because that cannot be null and can therefor be used in the join.

Assuming you want to compare the data (diff on entire rows) in the two tables:
SELECT *
FROM (SELECT 's1.t' "Row Source", a.*
FROM (SELECT col1, col2
FROM s1.t tbl1
MINUS
SELECT col1, col2
FROM s2.t tbl2) a
UNION ALL
SELECT 's2.t', b.*
FROM (SELECT col1, col2
FROM s2.t tbl2
MINUS
SELECT col1, col2
FROM s1.t tbl1) b)
ORDER BY 1;
More info about comparing two tables.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

What happened when you use table1.* <> table2.*? - sql

Imagine 2 table with the same structure but that might have some rows not equal for the same primary key. What really happen when using a where clause like this : " where table1.* <> table2.* " ? I "used" it in PostgreSQL but I'm interested for other's languages behavior with this weird thing.

Related

LEFT JOIN with subquery and accessing main table columns in select clause

How to select a value that can come from two different tables?

Delete from table A joining on table A in Redshift

T-SQL "Where not in" using two columns

Table "Diff" in Oracle

Categories

Resources