INNER JOIN on value instead of dimension - sql

This is a tricky and (in my humble opinion) unnecessary question my friend got during an interview process, which I also did not know when he asked me about it. When you run this SQL query:
SELECT *
FROM (VALUES (1), (1), (Null), (Null), (Null)) AS tb1 (col)
JOIN (VALUES (1), (1), (1), (Null), (Null)) AS tb2 (col)
ON tb1.col = tb2.col
It generates this result:
tb1.col
tb2.col
1
1
1
1
1
1
1
1
1
1
1
1
Why this JOIN works like that?

As mentioned by jarlh, the NULLs are not compared when executing tb1.col = tb2.col.
As for what all the 1's are, perhaps the following query will help understanding where each value comes from.
In this example, we compare the first letter of the values (which is always the letter A)
SELECT *
FROM (VALUES ('Abigail'), ('Allie'), (Null), (Null), (Null)) AS tb1 (col)
JOIN (VALUES ('Aria'), ('Allison'), ('Audrey'), (Null), (Null)) AS tb2 (col)
ON left(tb1.col, 1) = left(tb2.col, 1)
col
col
Abigail
Aria
Abigail
Allison
Abigail
Audrey
Allie
Aria
Allie
Allison
Allie
Audrey

Related

Find the difference of values between 2 columns after joining 2 tables on ms sql server

I have 2 tables in MS SQL Server 2019 - test1 and test2. Below are the table creation and insert statements for the 2 tables :
create table test2 (id nvarchar(10) , code nvarchar(5) , all_names nvarchar(80))
create table test3 (code nvarchar(5), name1 nvarchar(18) )
insert into test2 values ('A01', '50493', '12A2S0403-Buffalo;13A1T0101-Boston;13A2C0304-Miami')
insert into test2 values ('A02', '31278', '12A1S0205-Detroit')
insert into test2 values ('A03', '49218', '12A2S0403-Buffalo;12A1M0208-Manhattan')
insert into test3 values ('50493', 'T0101-Boston')
insert into test3 values ('49218', 'S0403-Buffalo')
insert into test3 values ('31278', 'S0205-Detroit')
I can join the 2 tables on the code column. Task is to find difference of test2.all_names and test3.name1. For example 'A01' should display the result as '12A2S0403-Buffalo;13A2C0304-Miami'.
A02 should not come as output.
The output should be :
Id | Diff_of_name
----------------------------------------
A01 | 12A2S0403-Buffalo;13A2C0304-Miami
A03 | 12A1M0208-Manhattan
Here's one possible solution, first using openjson to split your source string into rows, then using exists to check for matching values in table test3 and finally string_agg to provide the final result:
select Id, String_Agg(j.[value], ';') within group (order by j.seq) Diff_Of_Name
from test2 t2
cross apply (
select j.[value], Convert(tinyint,j.[key]) Seq
from OpenJson(Concat('["',replace(all_names,';', '","'),'"]')) j
where not exists (
select * from test3 t3
where t3.code = t2.code and j.[value] like Concat('%',t3.name1,'%')
)
)j
group by t2.Id;
Demo Fiddle
I don't like the need to normalize. However, if one must normalize, STRING_SPLIT is handy.
When done with the real work, STRING_AGG can de-normalize the data.
WITH normalized as ( -- normalize all_names in test2 to column name1
SELECT t2.id, t2.code, t2.all_names, n.value as [name1]
FROM test2 t2
OUTER APPLY STRING_SPLIT(t2.all_names, ';') n
) select * from normalized;
WITH normalized as ( -- normalize all_names in test2 to column name1
SELECT t2.id, t2.code, t2.all_names, n.value as [name1]
FROM test2 t2
OUTER APPLY STRING_SPLIT(t2.all_names, ';') n
), differenced as ( -- exclude name1 values listed in test3, ignoring leading characters
SELECT n.*
FROM normalized n
WHERE NOT EXISTS(SELECT * FROM test3 t3 WHERE t3.code = n.code AND n.name1 LIKE '%' + t3.name1)
) -- denormalize
SELECT id, STRING_AGG(name1, ';') as [Diff_of_name]
FROM differenced
group by id
order by id
id Diff_of_name
---------- ---------------------------------
A01 12A2S0403-Buffalo;13A2C0304-Miami
A03 12A1M0208-Manhattan

Using VALUE to make temporary table

What is the purpose of dummy function abc123() in the following queries? Both of them work, but I don't understand the need for the dummy function.
I've tried to remove it, but always with syntax errors.
SELECT MAX(NumbersTable) AS NumbersTable
FROM ( VALUES (1), (3), (2) ) AS abc123(NumbersTable)
SELECT TOP 1 NumbersTable
FROM ( VALUES (1), (3), (2) ) AS abc123(NumbersTable)
ORDER BY NumbersTable DESC
I expect the result to be 3, and that is what I get.
It is not a function. It defines the derived table created by values. The abc123 is the table alias. The NumbersTable is the name of the column.
If you run:
SELECT *
FROM ( VALUES (1), (3), (2) ) AS abc123(NumbersTable)
You will see:
NumbersTable
1
2
3
Because NumbersTable is the name of the column. You can also write:
SELECT abc123.NumbersTable
FROM ( VALUES (1), (3), (2) ) AS abc123(NumbersTable)

How can I get a random number generated in a CTE not to change in JOIN?

The problem
I'm generating a random number for each row in a table #Table_1 in a CTE, using this technique. I'm then joining the results of the CTE on another table, #Table_2. Instead of getting a random number for each row in #Table_1, I'm getting a new random number for every resulting row in the join!
CREATE TABLE #Table_1 (Id INT)
CREATE TABLE #Table_2 (MyId INT, ParentId INT)
INSERT INTO #Table_1
VALUES (1), (2), (3)
INSERT INTO #Table_2
VALUES (1, 1), (2, 1), (3, 1), (4, 1), (1, 2), (2, 2), (3, 2), (1, 3)
;WITH RandomCTE AS
(
SELECT Id, (ABS(CHECKSUM(NewId())) % 5)RandomNumber
FROM #Table_1
)
SELECT r.Id, t.MyId, r.RandomNumber
FROM RandomCTE r
INNER JOIN #Table_2 t
ON r.Id = t.ParentId
The results
Id MyId RandomNumber
----------- ----------- ------------
1 1 1
1 2 2
1 3 0
1 4 3
2 1 4
2 2 0
2 3 0
3 1 3
The desired results
Id MyId RandomNumber
----------- ----------- ------------
1 1 1
1 2 1
1 3 1
1 4 1
2 1 4
2 2 4
2 3 4
3 1 3
What I tried
I tried to obscure the logic of the random number generation from the optimizer by casting the random number to VARCHAR, but that did not work.
What I don't want to do
I'd like to avoid using a temporary table to store the results of the CTE.
How can I generate a random number for a table and preserve that random number in a join without using temporary storage?
This seems to do the trick:
WITH CTE AS(
SELECT Id, (ABS(CHECKSUM(NewId())) % 5)RandomNumber
FROM #Table_1),
RandomCTE AS(
SELECT Id,
RandomNumber
FROM CTE
GROUP BY ID, RandomNumber)
SELECT *
FROM RandomCTE r
INNER JOIN #Table_2 t
ON r.Id = t.ParentId;
It looks like SQL Server is aware that, at the point of being outside the CTE, that RandomNumber is effectively just NEWID() with some additional functions wrapped around it (DB<>Fiddle), and hence it still generates a unique ID for each row. The GROUP BY clause in the second CTE therefore forces the data engine to define RandomNumber a value so it can perform the GROUP BY.
Per the quote in this answer
The optimizer does not guarantee timing or number of executions of
scalar functions. This is a long-estabilished tenet. It's the
fundamental 'leeway' tha allows the optimizer enough freedom to gain
significant improvements in query-plan execution.
If it is important for your application that the random number be evaluated once and only once you should calculate it up front and store it into a temp table.
Anything else is not guaranteed and so is irresponsible to add into your application's code base - as even if it works now it may break as a result of a schema change/execution plan change/version upgrade/CU install.
For example Lamu's answer breaks if a unique index is added to #Table_1 (Id)
How about not using a real random number at all? Use rand() with a seed:
WITH RandomCTE AS (
SELECT Id,
CONVERT(INT, RAND(ROW_NUMBER() OVER (ORDER BY NEWID()) * 999999) * 5) as RandomNumber
FROM #Table_1
)
SELECT r.Id, t.MyId, r.RandomNumber
FROM RandomCTE rINNER JOIN
#Table_2 t
ON r.Id = t.ParentId;
The seed argument to rand() is pretty awful. Values of the seed near each other produce similar initial values, which is the reason for the multiplication.
Here is the db<>fiddle.

How do I produce permutations in the same column using SQL?

I have a table that looks like the following:
LETTERS
--------
A
B
C
D
E
F
G
H
I'd like to create a View that lists all the 3 letter combinations of these letters without repetition in the following way i.e. assigning a number to each combination.
ViewNew
-------
1 A
1 B
1 C
2 A
2 B
2 D
3 A
3 B
3 E
and so on.
Is the above possible? Any help will be much appreciated.
Check This. Using Joins and UNPIVOT we can find all permutions of letters.
select ID,ViewNew from
(
select row_number() over(order by (select 1)) AS ID,
C2.LETTERS as '1' ,C1.LETTERS AS '2' ,c3.LETTERS as '3' from #tableName C1,#tableName c2,#tableName c3
where C1.LETTERS != c2.LETTERS and c2.LETTERS ! = c3.LETTERS and c1.LETTERS ! = c3.LETTERS
) a
UNPIVOT
(
ViewNew
FOR [LETTERS] IN ([1], [2], [3])
)as f
OutPut :
For permutations (order is important):
DECLARE #q as table([No] int, L1 char(1), L2 char(1), L3 char(1))
INSERT INTO #q
SELECT
ROW_NUMBER() OVER (ORDER BY L1.Letter, L2.Letter, L3.Letter, L1.Letter),
L1.Letter,
L2.Letter,
L3.Letter
FROM
Letters L1 CROSS JOIN
Letters AS L2 CROSS JOIN
Letters AS L3
WHERE
(L1.Letter <> L2.Letter) AND
(L2.Letter <> L3.Letter) AND
(L1.Letter <> L3.Letter)
SELECT [No], L1 AS Letter FROM #q
UNION
SELECT [No], L2 FROM #q
UNION
SELECT [No], L3 FROM #q
This can actually be done in a single query, yet with repetition of #q query.
I would move #q query into subview, if View is the goal.
Update:
Use UNPIVOT to make things even simpler, as pointed out in Bhosale's answer.
If you want to create a list of all unique combinations between two tables, you need only select from both tables at once and SQL Server will give you what you're after. This is called a CROSS JOIN.
declare #t1 table (letter char(1))
declare #t2 table (number int)
insert #t1 values ('A'), ('B'), ('C')
insert #t2 values (1), (2), (3), (4)
select t2.number, t1.letter from #t1 as t1, #t2 as t2
Results
number letter
--------------
1 A
1 B
1 C
2 A
2 B
2 C
3 A
3 B
3 C
4 A
4 B
4 C

Dynamic SQL Procedure that can insert into a table using a while loop to control the number of row entries

I have a small SQL based challenge that i'm trying to solve to better my knowledge of Dynamic SQL.
My requirements are as follows.
I created a table that looks as follows:
CREATE TABLE Prison_Doors
(
DoorNum INT IDENTITY(1,1) PRIMARY KEY,
DoorOpen BIT,
DoorClosed BIT,
Trips INT
)
GO
I need to Create a Dynamic SQL Proc to insert 50 Door numbers and assign them as closed.
Expected result of proc:
|DoorNum|DoorOpen|DoorClosed|Trips|
|-------|--------|----------|-----|
| 1 | 0 | 1 |null |
|-------|--------|----------|-----|
|---------All the way to 50-------|
|-------|--------|----------|-----|
| 50 | 0 | 1 |null |
This is what I have written but it is not inserting:
BEGIN
DECLARE #SQL VARCHAR(8000)
DECLARE #Index INT
SET #Index=1
WHILE (#Index<=50)
BEGIN
SET #SQL= 'INSERT INTO Prison_Doors(DoorNum,DoorOpen,DoorClosed)
VALUES('+CAST(#Index AS VARCHAR)+',0,1),'
SET #Index=#Index+1
END
SET #SQL = SUBSTRING(#SQL, 1, LEN(#SQL)-1)
EXEC(#SQL)
END
I would like to know what I am doing wrong.
after all of this is done I then need to run another loop to start at door one and change every second door to open and change trips to one and then increment to every 3 doors to open and trips becomes 2 and this incrementation continues until all doors are open which will then select the number of trips that it took.
I hope somebody can assist me with this as I am new to Dynamic SQL and I just need some guidance and not the complete solution.
Help is much appreciated :)
The error you got is because you are trying to insert into an identity column DoorNumber:
DoorNum INT IDENTITY(1,1) PRIMARY KEY,
Remove that columns from the column list, instead of:
INSERT INTO Prison_Doors(DoorNum,DoorOpen,DoorClosed)
remove that column DoorNum:
INSERT INTO Prison_Doors(DoorOpen,DoorClosed)
...
However, there is no need for dynamic SQL to do that, you can do this using an anchor table like this:
WITH temp
AS
(
SELECT n
FROM (VALUES(1), (2), (3), (4)) temp(n)
), nums
AS
(
SELECT ROW_NUMBER() OVER(ORDER BY (SELECT 1)) AS n
FROM temp t1, temp t2, temp t3
)
INSERT INTO Prison_Doors(DoorOpen, DoorClosed)
SELECT 0 AS DoorOpen, 1 AS DoorClosed
FROM nums
WHERE n <= 50;
Live Demo.
Update:
What my code does line by line?
Generating Sequence of numbers:
The first problem was generating a sequence of 50 numbers from 1 to 50, I used an anchor table with only four rows from 1 to 4 like this:
SELECT n
FROM (VALUES(1), (2), (3), (4)) temp(n);
This syntax using the VALUES is new to SQL-Server-2008, it is called Row Value Constructor. After the VALUES, you assign an alias of the table and the target columns in parentheses like temp(n).
For old versions you have to use something like :
SELECT n
FROM
(
SELECT 1 AS n
UNION ALL
SELECT 2
UNION ALL
SELECT 3
UNION ALL
SELECT 4
) AS temp;
This will give you only 4 rows, but we need to generate 50. Thats why I CROSS JOIN this table three time with itself using:
FROM temp t1, temp t2, temp t3
It is the same as
FROM temp t1
CROSS JOIN temp t2
CROSS JOIN temp t3
This will multiply these rows 64 times, 4 rows3 = 64 rows:
1 1 1
1 2 1
1 3 1
1 4 1
2 1 1
2 2 1
2 3 1
2 4 1
....
3 1 4
3 2 4
3 3 4
3 4 4
4 1 4
4 2 4
4 3 4
4 4 4
The Use Of ROW_NUMBER() Function:
Then using the ROW_NUMBER() will give us a ranking number from 1 to 64 like this:
WITH temp
AS
(
SELECT n
FROM (VALUES(1), (2), (3), (4)) temp(n)
)
SELECT ROW_NUMBER() OVER(ORDER BY (SELECT 1)) AS n
FROM temp t1, temp t2, temp t3;
Note that: ROW_NUMBER requires an ORDER BY clause, in or case it doesn't matter the order, so I used (SELECT 1).
There is also another way for generating a sequence numbers with out the use of ROW_NUMBER, it also depends on the CROSS JOIN, with an anchor table like:
WITH temp
AS
(
SELECT n
FROM (
VALUES(0), (1), (2), (3), (4), (5), (6), (7), (8), (9)
) temp(n)
), nums
AS
(
SELECT t1.n * 100 + t2.n * 10 + t3.n + 1 AS n
FROM temp t1, temp t2, temp t3
)
SELECT n
FROM nums
ORDER BY n;
Another Way of Generating Sequence Of Numbers
Common Table Expressions:
The CTE is called common table expression, and it was introdeced in SQL Server 2005. It is one of the table expressions types that SQL Server supports. The other three are:
Derived tables,
Views, and
Inline table-valued functions.
It has a lot of important advantages. One of them is let you create a virtual tables that you can reuse them later, like what I did:
WITH temp
AS
(
SELECT n
FROM (VALUES(1), (2), (3), (4)) temp(n)
), nums
AS
(
SELECT ROW_NUMBER() OVER(ORDER BY (SELECT 1)) AS n
FROM temp t1, temp t2, temp t3
)
...
Here I defined two CTE's temp then another one nums that ruse that temp so this OL, you can create multiple CTE's, just put a semicolon, then a new one with the AS clause and two ().
Insert into one table from another table using INSERT INTO ... SELECT ...
Now, we have a virtual table nums having rows from 1 to 64, we need to insert the rows from 1 to 50.
For this, you can use the INSERT INTO ... SELECT ....
Note that the columns in the INSERT clause are optional, but If you do so, you have to put a value for each row, if not you will got an error, for example if you have four columns and you put only three values in the VALUES clause or in the SELECT clause, then you will got an error. This is not valid for the idenetityt columns which are defined with:
Identity(1,1)
^ ^
| |
| ------------------The seed
The start
In this case you simply ignore that column in the columns list in the INSERT clause and it will have the identity value. There is an option that let you insert a value manually like in the #Raj's answer.
Note that: In my answer, I am not inserting the sequence numbers in to that column instead, inserting the values 50 times. But the actual numbers are generating automatically because of the Identity column:
...
INSERT INTO Prison_Doors(DoorOpen, DoorClosed)
SELECT 0 AS DoorOpen, 1 AS DoorClosed
FROM nums;
WHERE n <= 50;
Firstly, you have declared DoorNum as Identity and then you are trying to explicitly insert values into that column. SQL Server does not allow this, unless you choose to
SET IDENTITY_INSERT ON
Try this query. It should insert the 50 rows you want
DECLARE #SQL VARCHAR(8000)
DECLARE #Index INT
SET #Index=1
WHILE (#Index<=50)
BEGIN
SET #SQL= 'INSERT INTO Prison_Doors(DoorOpen,DoorClosed)VALUES(0,1)'
EXEC(#SQL)
SET #Index=#Index+1
END
Raj