Join 3 tables INTO 1 new table - sql

Table A = Clarity (uniqe ID = CaseID; contains CSN)
Table B = Survey (unique ID = CSN)
Table C = QA (unique ID = caseID; contains CSN)
Table D = FullData
Goal:
“TableD” contains:
all “TableC”,
all “TableA” for which there is a “CaseID” and “CSN” in common with “Table C”
all “TableB” for which there is a “CSN” in common with “Table C”
Table is remade every evening. There are a lot of people who will be doing query research on “Table D.” I think it needs to be a table and not a view.
I was going to use:
Create TableD AS
Select *
From TableC
Left Join TableA
ON TableA.CaseID = TableC.CaseID AND TableA.CSN = TableC.CSN
Left Join TableB
ON TableC.CSN = TableC.CSN
I was going to use SQL Agent to make the script run every night. This seems too simple. Do I have to drop the table that was made the day before? Does anyone see an error? I am using Microsoft SQL Server.

I am assuming you have a SQL Server database.
The table approach might have an advantage because you said that many people would be using it for research/report purpose. You don't want to bog down your main application tables due to many query requests for reporting/research. Its best to have a separate reporting table so that your main application tables are unaffected by the extra traffic for reporting/research purpose.
If you want to use a table approach, then following script can be used. I would recommend against using * in SELECT * since there are duplicate columns across the tables you are using like CaseID and CSN; instead mention a list of columns that you want in your tableD.
IF Object_id(N'TableD', N'U') IS NULL
BEGIN
SELECT *
INTO tabled
FROM tablec
LEFT JOIN tablea
ON tablea.caseid = tablec.caseid
AND tablea.csn = tablec.csn
LEFT JOIN tableb
ON tablec.csn = tablec.csn;
END
ELSE
BEGIN
DELETE
FROM tabled;
INSERT INTO tabled
SELECT *
FROM tablec
LEFT JOIN tablea
ON tablea.caseid = tablec.caseid
AND tablea.csn = tablec.csn
LEFT JOIN tableb
ON tablec.csn = tablec.csn;
END

Related

What is the best approach for performance when querying across multiple DB's

We have a setup where our customers each have their own databases. We also have some shared databases that are used to hold things like module access, reports, customer server locations, etc.
We have a few queries that look like this
USING CustomerDB
SELECT
fields
FROM
CustomerTable C
INNER JOIN SharedDb.dbo.SharedtableA A ON A.Id = C.SharedAId
INNER JOIN SharedDb.dbo.SharedtableB B ON B.Id = A.SharedBId
Does it make a difference to query plans etc if we were to change the query so that it executes in separate spaces?
E.g
USE CustomerDb
DECLARE #SharedTemp TABLE (
Id int NOT NULL
)
INSERT INTO #SharedTemp
SELECT
Id
FROM
SharedDb.dbo.SharedtableA A
INNER JOIN SharedDb.dbo.SharedtableB B ON B.Id = A.SharedBId
SELECT
fields
FROM
CustomerTable C
INNER JOIN #SharedTemp A ON A.Id = C.SharedAId
Thank you in advance for your insights

SQL Server Inner join on UID

I am trying to do an inner join on 2 tables having same names, but in different databases (database A and B). The table is as below:
My aim is to update columns MinMultiplyFactor and MaxMultipleFactor in the table in database A based on 2 conditions (NodeId and Year should match with the other table).
I wrote a query:
update DB_A.[dbo].[TABLE]
set DB_A.[dbo].[TABLE].MinMultiplyFactor = B.MinMultiplyFactor
FROM DB_A.[dbo].[TABLE] A INNER JOIN DB_B.[dbo].[TABLE] B
ON A.NodeId = B.NodeId
and A.Year = B.Year;
But it's saying 0 rows updated even when there are a lot of common Node Ids and years. The NodeId is a uniqueIdentifier in the table. May be that's the issue?

SQL - Searching a table using another table's column as criteria

I have table B with bcust(4-digit integer) and bdate(date) columns. I also have table C with ccust(4-digit integer) and cdate(date). I want to show the records from table c where cdate occurred after bdate.
I guess, maybe you're looking for this?
SELECT c.*
FROM c
INNER JOIN b
ON b.bcust = c.ccust
AND b.bdate < c.cdate;
I assumed, that the records are linked via the bcust and ccust columns.
Although you did not mention anything on how the records in both tables are related, I guess that records are related if bcust = ccust.
Then something like this should do what you want:
SELECT c.*
FROM tableC c
INNER JOIN tableB b ON c.ccust = b.bcust
WHERE c.cdate > b.bdate

How to select records which don't exist in another table or have a different status?

I am trying to select records from a temp table based on another temp table which holds their previous statuses (StatusHistory).
So if the record doesn't exist in the status table, then it should be selected. If the status of the record is different than the one in the StatusHistory table, then the record should be selected. Otherwise, if it exists with the same status in the StatusHistory table, then it should be ignored.
I have this SQL but it doesn't seem to be the best solution. Can you please point me to a better way to achieve that assuming that there are thousands of records in the tables? Would it be possible to achieve the same result with a JOIN statement?
SELECT AI.item
FROM #AllItems AI
WHERE NOT EXISTS (
SELECT * FROM #StatusHistory HS
WHERE HS.itemId = AI.itemId
) OR NOT AI.itemStatus IN ( SELECT HS.itemStatusHistory
FROM #StatusHistory HS
WHERE HS.itemId = AI.itemId
AND HS.itemId = AI.itemId )
Yes, you can do this with a LEFT JOIN.
SELECT AI.item
FROM #AllItems AI
LEFT JOIN #StatusHistory HS ON AI.itemId = HS.itemId
AND AI.itemStatus = HS.itemStatusHistory
WHERE HS.itemId IS NULL
A better solution, however, is to use NOT EXISTS:
SELECT AI.item
FROM #AllItems AI
WHERE NOT EXISTS
(
SELECT 1 FROM #StatusHistory SH
WHERE SH.itemId = AI.itemId
AND SH.itemStatusHistory = AI.itemStatus
);
As pointed out by Aaron, this usually performs better than a LEFT JOIN.

Filter a SQL Server table dynamically using multiple joins

I am trying to filter a single table (master) by the values in multiple other tables (filter1, filter2, filter3 ... filterN) using only joins.
I want the following rules to apply:
(A) If one or more rows exist in a filter table, then include only those rows from the master that match the values in the filter table.
(B) If no rows exist in a filter table, then ignore it and return all the rows from the master table.
(C) This solution should work for N filter tables in combination.
(D) Static SQL using JOIN syntax only, no Dynamic SQL.
I'm really trying to get rid of dynamic SQL wherever possible, and this is one of those places I truly think it's possible, but just can't quite figure it out. Note: I have solved this using Dynamic SQL already, and it was fairly easy, but not particularly efficient or elegant.
What I have tried:
Various INNER JOINS between master and filter tables - works for (A) but fails on (B) because the join removes all records from the master (left) side when the filter (right) side has no rows.
LEFT JOINS - Always returns all records from the master (left) side. This fails (A) when some filter tables have records and some do not.
What I really need:
It seems like what I need is to be able to INNER JOIN on each filter table that has 1 or more rows and LEFT JOIN (or not JOIN at all) on each filter table that is empty.
My question: How would I accomplish this without resorting to Dynamic SQL?
In SQL Server 2005+ you could try this:
WITH
filter1 AS (
SELECT DISTINCT
m.ID,
HasMatched = CASE WHEN f.ID IS NULL THEN 0 ELSE 1 END,
AllHasMatched = MAX(CASE WHEN f.ID IS NULL THEN 0 ELSE 1 END) OVER ()
FROM masterdata m
LEFT JOIN filtertable1 f ON join_condition
),
filter2 AS (
SELECT DISTINCT
m.ID,
HasMatched = CASE WHEN f.ID IS NULL THEN 0 ELSE 1 END,
AllHasMatched = MAX(CASE WHEN f.ID IS NULL THEN 0 ELSE 1 END) OVER ()
FROM masterdata m
LEFT JOIN filtertable2 f ON join_condition
),
…
SELECT m.*
FROM masterdata m
INNER JOIN filter1 f1 ON m.ID = f1.ID AND f1.HasMatched = f1.AllHasMatched
INNER JOIN filter2 f2 ON m.ID = f2.ID AND f2.HasMatched = f2.AllHasMatched
…
My understanding is, filter tables without any matches simply must not affect the resulting set. The output should only consist of those masterdata rows that have matched all the filters where matches have taken place.
SELECT *
FROM master_table mt
WHERE (0 = (select count(*) from filter_table_1)
OR mt.id IN (select id from filter_table_1)
AND (0 = (select count(*) from filter_table_2)
OR mt.id IN (select id from filter_table_2)
AND (0 = (select count(*) from filter_table_3)
OR mt.id IN (select id from filter_table_3)
Be warned that this could be inefficient in practice. Unless you have a specific reason to kill your existing, working, solution, I would keep it.
Do inner join to get results for (A) only and do left join to get results for (B) only (you will have to put something like this in the where clause: filterN.column is null) combine results from inner join and left join with UNION.
Left Outer Join - gives you the MISSING entries in master table ....
SELECT * FROM MASTER M
INNER JOIN APPRENTICE A ON A.PK = M.PK
LEFT OUTER JOIN FOREIGN F ON F.FK = M.PK
If FOREIGN has keys that is not a part of MASTER you will have "null columns" where the slots are missing
I think that is what you looking for ...
Mike
First off, it is impossible to have "N number of Joins" or "N number of filters" without resorting to dynamic SQL. The SQL language was not designed for dynamic determination of the entities against which you are querying.
Second, one way to accomplish what you want (but would be built dynamically) would be something along the lines of:
Select ...
From master
Where Exists (
Select 1
From filter_1
Where filter_1 = master.col1
Union All
Select 1
From ( Select 1 )
Where Not Exists (
Select 1
From filter_1
)
Intersect
Select 1
From filter_2
Where filter_2 = master.col2
Union All
Select 1
From ( Select 1 )
Where Not Exists (
Select 1
From filter_2
)
...
Intersect
Select 1
From filter_N
Where filter_N = master.colN
Union All
Select 1
From ( Select 1 )
Where Not Exists (
Select 1
From filter_N
)
)
I have previously posted a - now deleted - answer based on wrong assumptions on you problems.
But I think you could go for a solution where you split your initial search problem into a matter of constructing the set of ids from the master table, and then select the data joining on that set of ids. Here I naturally assume you have a kind of ID on your master table. The filter tables contains the filter values only. This could then be combined into the statement below, where each SELECT in the eligble subset provides a set of master ids, these are unioned to avoid duplicates and that set of ids are joined to the table with data.
SELECT * FROM tblData INNER JOIN
(
SELECT id FROM tblData td
INNER JOIN fa on fa.a = td.a
UNION
SELECT id FROM tblData td
INNER JOIN fb on fb.b = td.b
UNION
SELECT id FROM tblData td
INNER JOIN fc on fc.c = td.c
) eligible ON eligible.id = tblData.id
The test has been made against the tables and values shown below. These are just an appendix.
CREATE TABLE tblData (id int not null primary key identity(1,1), a varchar(40), b datetime, c int)
CREATE TABLE fa (a varchar(40) not null primary key)
CREATE TABLE fb (b datetime not null primary key)
CREATE TABLE fc (c int not null primary key)
Since you have filter tables, I am assuming that these tables are probably dynamically populated from a front-end. This would mean that you have these tables as #temp_table (or even a materialized table, doesn't matter really) in your script before filtering on the master data table.
Personally, I use the below code bit for filtering dynamically without using dynamic SQL.
SELECT *
FROM [masterdata] [m]
INNER JOIN
[filter_table_1] [f1]
ON
[m].[filter_column_1] = ISNULL(NULLIF([f1].[filter_column_1], ''), [m].[filter_column_1])
As you can see, the code NULLs the JOIN condition if the column value is a blank record in the filter table. However, the gist in this is that you will have to actively populate the column value to blank in case you do not have any filter records on which you want to curtail the total set of the master data. Once you have populated the filter table with a blank, the JOIN condition NULLs in those cases and instead joins on itself with the same column from the master data table. This should work for all the cases you mentioned in your question.
I have found this bit of code to be faster in terms of performance.
Hope this helps. Please let me know in the comments.