What is the best SQL join to join two SELECT output?

What is the best SQL join to join two SELECT output? - sql

I have a first SQL query which return the following :
TEST_NAME
SUBNAME
BDATE
CODE
Number
TEST1
Blabla1
01-JAN-2022
TEST_A
15645
TEST1
Blabla1
01-MAR-2022
TEST_B
58464
TEST1
Blabla1
01-JUN-2022
TEST_C
46456
TEST1
Blabla1
01-SEP-2022
TEST_D
68676
TEST1
Blabla1
01-DEC-2022
TEST_E
68766
TEST2
Blabla2
01-JAN-2022
TEST_A
15645
TEST2
Blabla2
01-MAR-2022
TEST_B
58464
TEST2
Blabla2
01-JUN-2022
TEST_C
46456
TEST2
Blabla2
01-SEP-2022
TEST_D
68676
TEST2
Blabla2
01-DEC-2022
TEST_E
68766
On an other side I have made an other sql request :
SELECT *
FROM db.Test_Table TT
WHERE TT.TC_CODE = 1
Which gibes the below output :
A
B
C
TEST1
05-MAR-2022
4564123
TEST1
05-DEC-2022
1561618
TEST2
05-JAN-2022
1651156
TEST2
05-JUN-2022
1564132
TEST2
05-SEP-2022
1561565
I was wondering how to simply join the above table to the first one, to have the below output :
TEST_NAME
SUBNAME
BDATE
CODE
Number
TEST1
Blabla1
01-JAN-2022
TEST_A
15645
TEST1
Blabla1
01-MAR-2022
TEST_B
58464
TEST1
Blabla1
05-MAR-2022
NewItem
4564123
TEST1
Blabla1
01-JUN-2022
TEST_C
46456
TEST1
Blabla1
01-SEP-2022
TEST_D
68676
TEST1
Blabla1
01-DEC-2022
TEST_E
68766
TEST1
Blabla1
05-DEC-2022
NewItem
1561618
TEST2
Blabla2
01-JAN-2022
TEST_A
15645
TEST2
Blabla2
05-JAN-2022
NewItem
1651156
TEST2
Blabla2
01-MAR-2022
TEST_B
58464
TEST2
Blabla2
01-JUN-2022
TEST_C
46456
TEST2
Blabla2
05-JUN-2022
NewItem
1564132
TEST2
Blabla2
01-SEP-2022
TEST_D
68676
TEST2
Blabla2
05-SEP-2022
NewItem
1561565
TEST2
Blabla2
01-DEC-2022
TEST_E
68766
TRYING to do :
SELECT * FROM T1
UNION
SELECT
TT.A as TEST_NAME,
TT.B as BDATE,
TT.C as Number,
FROM db.Test_Table TT
Throw :
01789. 00000 - "query block has incorrect number of result columns"
Also I don't know how to fill empty cells with corresponding value, I guess that if the UNION works, I would have the following DF
TEST_NAME
SUBNAME
BDATE
CODE
Number
TEST1
Blabla1
01-JAN-2022
TEST_A
15645
TEST1
Blabla1
01-MAR-2022
TEST_B
58464
TEST1
(null)
05-MAR-2022
NewItem
4564123
TEST1
Blabla1
01-JUN-2022
TEST_C
46456
TEST1
Blabla1
01-SEP-2022
TEST_D
68676
TEST1
Blabla1
01-DEC-2022
TEST_E
68766
TEST1
(null)
05-DEC-2022
NewItem
1561618
TEST2
Blabla2
01-JAN-2022
TEST_A
15645
TEST2
(null)
05-JAN-2022
NewItem
1651156
TEST2
Blabla2
01-MAR-2022
TEST_B
58464
TEST2
Blabla2
01-JUN-2022
TEST_C
46456
TEST2
(null)
05-JUN-2022
NewItem
1564132
TEST2
Blabla2
01-SEP-2022
TEST_D
68676
TEST2
(null)
05-SEP-2022
NewItem
1561565
TEST2
Blabla2
01-DEC-2022
TEST_E
68766
How can I replace these (null) by right value ? For this example values would be Blabla1 or Blabla2 depending on TEST_NAME

Step 1: we start with a UNION ALL to combine the 2 tables.
SELECT TEST_NAME, SUBNAME, BDATE, CODE , Number FROM T1
UNION SELECT A , NULL , B , 'NewItem', C FROM db.Test_Table
BTW, the column names are entirely deduced from the query before the first UNION, you can remove all the AS ColumnAlias after that first part.
Step 2: with the above query, the SUBNAME column is empty for all the records that come from Test_Table.
On that part, you must notice your database is not super good in terms of normal forms (1NF, 2NF, 3NF). Here, it means I will need to use the DISTINCT keyword, which should normally not be necessary.
There are 2 methods to fill SUBNAME
Method 1: Scalar subquery (= a query that returns a single value).
Replace NULL from the query in Step 1 by:
(SELECT DISTINCT SUBNAME FROM T1 WHERE TEST_NAME = Test_Table.A)
Method 2: JOIN
...
UNION ALL
SELECT A, T.SUBNAME, B, 'NewItem', C
FROM db.Test_Table
JOIN (SELECT DISTINCT TEST_NAME, SUBNAME FROM T1) T ON A = TEST_NAME
Again, I suppose you will not need DISTINCT in your real case.
Additional notes:
Method 1 will throw an error if the subquery returns more than 1 record (= if a TEST_NAME is linked to 2 SUBNAME)
If you expect TEST_NAME will only be linked to 1 value for SUBNAME, then I urge you to consider this error as a good thing, that is a protection that will tell you the data may not look like what you expect.
In Oracle but possibly in some other databases, a scalar subquery could be faster than the JOIN counterpart due to the use of a cache. See here for instance.

Related

INNER JOIN on value instead of dimension

This is a tricky and (in my humble opinion) unnecessary question my friend got during an interview process, which I also did not know when he asked me about it. When you run this SQL query:
SELECT *
FROM (VALUES (1), (1), (Null), (Null), (Null)) AS tb1 (col)
JOIN (VALUES (1), (1), (1), (Null), (Null)) AS tb2 (col)
ON tb1.col = tb2.col
It generates this result:
tb1.col
tb2.col
1
1
1
1
1
1
1
1
1
1
1
1
Why this JOIN works like that?

As mentioned by jarlh, the NULLs are not compared when executing tb1.col = tb2.col.
As for what all the 1's are, perhaps the following query will help understanding where each value comes from.
In this example, we compare the first letter of the values (which is always the letter A)
SELECT *
FROM (VALUES ('Abigail'), ('Allie'), (Null), (Null), (Null)) AS tb1 (col)
JOIN (VALUES ('Aria'), ('Allison'), ('Audrey'), (Null), (Null)) AS tb2 (col)
ON left(tb1.col, 1) = left(tb2.col, 1)
col
col
Abigail
Aria
Abigail
Allison
Abigail
Audrey
Allie
Aria
Allie
Allison
Allie
Audrey

Find the difference of values between 2 columns after joining 2 tables on ms sql server

I have 2 tables in MS SQL Server 2019 - test1 and test2. Below are the table creation and insert statements for the 2 tables :
create table test2 (id nvarchar(10) , code nvarchar(5) , all_names nvarchar(80))
create table test3 (code nvarchar(5), name1 nvarchar(18) )
insert into test2 values ('A01', '50493', '12A2S0403-Buffalo;13A1T0101-Boston;13A2C0304-Miami')
insert into test2 values ('A02', '31278', '12A1S0205-Detroit')
insert into test2 values ('A03', '49218', '12A2S0403-Buffalo;12A1M0208-Manhattan')
insert into test3 values ('50493', 'T0101-Boston')
insert into test3 values ('49218', 'S0403-Buffalo')
insert into test3 values ('31278', 'S0205-Detroit')
I can join the 2 tables on the code column. Task is to find difference of test2.all_names and test3.name1. For example 'A01' should display the result as '12A2S0403-Buffalo;13A2C0304-Miami'.
A02 should not come as output.
The output should be :
Id | Diff_of_name
----------------------------------------
A01 | 12A2S0403-Buffalo;13A2C0304-Miami
A03 | 12A1M0208-Manhattan

Here's one possible solution, first using openjson to split your source string into rows, then using exists to check for matching values in table test3 and finally string_agg to provide the final result:
select Id, String_Agg(j.[value], ';') within group (order by j.seq) Diff_Of_Name
from test2 t2
cross apply (
select j.[value], Convert(tinyint,j.[key]) Seq
from OpenJson(Concat('["',replace(all_names,';', '","'),'"]')) j
where not exists (
select * from test3 t3
where t3.code = t2.code and j.[value] like Concat('%',t3.name1,'%')
)
)j
group by t2.Id;
Demo Fiddle

I don't like the need to normalize. However, if one must normalize, STRING_SPLIT is handy.
When done with the real work, STRING_AGG can de-normalize the data.
WITH normalized as ( -- normalize all_names in test2 to column name1
SELECT t2.id, t2.code, t2.all_names, n.value as [name1]
FROM test2 t2
OUTER APPLY STRING_SPLIT(t2.all_names, ';') n
) select * from normalized;
WITH normalized as ( -- normalize all_names in test2 to column name1
SELECT t2.id, t2.code, t2.all_names, n.value as [name1]
FROM test2 t2
OUTER APPLY STRING_SPLIT(t2.all_names, ';') n
), differenced as ( -- exclude name1 values listed in test3, ignoring leading characters
SELECT n.*
FROM normalized n
WHERE NOT EXISTS(SELECT * FROM test3 t3 WHERE t3.code = n.code AND n.name1 LIKE '%' + t3.name1)
) -- denormalize
SELECT id, STRING_AGG(name1, ';') as [Diff_of_name]
FROM differenced
group by id
order by id
id Diff_of_name
---------- ---------------------------------
A01 12A2S0403-Buffalo;13A2C0304-Miami
A03 12A1M0208-Manhattan

Use of pivot function

I have the following oracle table:
Tag Value
A Test
B Test2
C Test3
D Test4
But need an output like:
A B C D
Test Test2 Test3 Test4
Where A, B, ... should be my column names. I know the pivot/unpivot function but I didn't get the right result yet.
This was my attempt but with no succes because of error: ORA-00933
SELECT *
FROM (
SELECT tag
FROM table
WHERE VALUES LIKE '%Test%'
) AS DT
PIVOT(max(value) FOR tag IN([A],[B])) AS PT

Something like that:
select * from (select tag, Value from TAB) PIVOT (max(value) for tag in ('A','B','C','D'))

Insert unique values after addition of a column in a table

We have a table with two columns and have added another column recently (named sequence_no) , Is there a way to insert unique values like , 1,2,3 for every row in the table ?
eg
table name : test
desc test
name varchar2
value varchar2
--> n_seq_no number
select * from test
Name value n_Seq_no
test1 100
test2 200
test3 300
test4 500
The table already had name, and value as the columns of the table, I need to add unique values for the n_Seq_no column with the existing data,
Output format:
select * from test
Name value n_Seq_no
test1 100 1
test2 200 2
test3 300 3
test4 500 4
and so on for all the rows in table.

You could simply set the new column as ROWNUM.
Something like,
SQL> CREATE TABLE t(
2 A NUMBER,
3 b NUMBER);
Table created.
SQL>
SQL> INSERT INTO t(A) VALUES(100);
1 row created.
SQL> INSERT INTO t(A) VALUES(200);
1 row created.
SQL> INSERT INTO t(A) VALUES(300);
1 row created.
SQL>
SQL> SELECT * FROM t;
A B
---------- ----------
100
200
300
SQL>
SQL> UPDATE t SET b = ROWNUM;
3 rows updated.
SQL> SELECT * FROM T;
A B
---------- ----------
100 1
200 2
300 3
SQL>
If you are on 12c, you could use an IDENTITY COLUMN.

Assuming that your table is really big it's better to recreate and repopulate:
rename test to old_test;
create table new_test
as
select t.*, rownum as n_seq_no
from old_test t
order by value;
Don't forget to migrate grants, indexes, triggers and etc if any.
UPDATE: ordering is optional. It is required only if you want to assign n_seq_no value using some predefine ordering.

Returning Records with Distinct or Unique data over multiple fields

What I am trying to accomplish
Select up to two records from table Visit that contain one of a number of codes in fields Test1-Test8 Within the last 2 years.
But the two records cannot have any duplicate codes.
ie Lets say Record1 contains '85.43' in Test4
and Record2 contains '85.43' in Test2
I would not want it to return Record2 because a Record with '85.43' already exists.
Anyone know how I might accomplish this?
Here is my initial query that does not have the duplicate logic built into it.
select TOP 2 * from Visit where customer = CustomerCode AND
(Test1 IN ('85.41', '85.43', '85.45', '85.47')
or Test2 IN ('85.41', '85.43', '85.45', '105.47')
or Test3 IN ('85.41', '85.43', '85.45', '105.47')
or Test4 IN ('85.41', '85.43', '85.45', '105.47')
or Test5 IN ('85.41', '85.43', '85.45', '105.47')
or Test6 IN ('85.41', '85.43', '85.45', '105.47')
or Test7 IN ('85.41', '85.43', '85.45', '105.47')
or Test8 IN ('85.41', '85.43', '85.45', '105.47'))
AND TIMESTAMPDIFF(SQL_TSI_MONTH, DATE_IN, CurrentDate) <= 24;
Thanks

This is the cleanest way I can think of doing this, without resorting to all 64 comparisons that would be required if using the table directly:
CREATE TABLE #t (ID int, TestField varchar(255))
INSERT INTO #t SELECT Id, Test1 FROM Visit WHERE customer = CustomerCode AND TIMESTAMPDIFF(SQL_TSI_MONTH, DATE_IN, CurrentDate) <= 24
INSERT INTO #t SELECT Id, Test2 FROM Visit WHERE customer = CustomerCode AND TIMESTAMPDIFF(SQL_TSI_MONTH, DATE_IN, CurrentDate) <= 24
INSERT INTO #t SELECT Id, Test3 FROM Visit WHERE customer = CustomerCode AND TIMESTAMPDIFF(SQL_TSI_MONTH, DATE_IN, CurrentDate) <= 24
... -- repeat for each Test field
SELECT TOP 2 * FROM Visit WHERE Id IN (
SELECT a.Id FROM #t a
LEFT JOIN #t b
ON a.Id > b.Id
AND a.TestField = b.TestField
GROUP BY a.Id
HAVING count(b.TestField) = 0
)
ORDER BY Id
DROP TABLE #t
Depending on the size of the table you may need to add an index to the temp table, or it will be unbearably slow:
CREATE INDEX some_unique_name_index ON #t (ID, TestField)
Another alternative to speed this up would be to use a T-SQL loop to find one row at a time that matches the criteria and add them to a result table. Once you have enough results (2 in this case), you can exit the loop. For very large tables this would probably be the recommended approach.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

What is the best SQL join to join two SELECT output? - sql

Related

INNER JOIN on value instead of dimension

Find the difference of values between 2 columns after joining 2 tables on ms sql server

Use of pivot function

Insert unique values after addition of a column in a table

Returning Records with Distinct or Unique data over multiple fields

Categories

Resources