Combine two SQL Server table results to pull latest data where exists - sql

I have two tables a main table and a work in progress table. Any inserts/updates are inserted into the WIP table while the record is being manipulated, this allows for validation checks and the like. I want to create a view that combines the two tables showing the WIP table data whenever it exists and the main table data when there is no WIP data.
I have figured out a way to do this but it seems that it's not the most elegant solution. I would like to know if there are other ideas or better solutions?
Example illustrating the situation:
select mt.id, wt.id wip_id, isnull(wt.name,mt.name) name,
isnull(wt.address, mt.address) address
from main_table mt full outer join
wip_table wt on mt.id = wt.orig_id;
So that will pull results from the WIP table when they exist, if they dont it will pull results from the main table. This was a simple example but the tables could have many rows.

if you want data either from one table or the other:
select top 1 *
from
(
select 1 as prio, wt.name, wt.address, .... from wip_table wt where ...
union
select 2 as prio, mt.name, mt.address, .... from main_table mt where ...
order by prio
) x
otherwise, like you have done (checking individual columns), but maybe using a left outer join rather than a full one:
select
mt.id
, wt.id wip_id
, isnull(wt.name,mt.name) name
, isnull(wt.address, mt.address) address
from main_table mt left outer join wip_table wt
on mt.id = wt.orig_id;

Related

SQL - Table Join To Compare NULL Values Where Join is NULL

I've been asked to basically write a report that displays data in two different databases and be able to see in either database if something is missing.
IE, the invoice number may exist in database1, but not in database2 and vice versa.
I've got the following query below but it only returns all the data from the second table, with NULL values for the first. I'd like to set it up to return the NULL Values in both, but I think the problem is because my join is on the values that can be NULL, so it won't return the values that exist in the first table and not the second.
Can someone step me through how to resolve an issue like this?
As far as I'm aware, I don't necessarily have any other tables to join unless I try to join more tables from each database.
Query:
Select TC.PO_Number, TC.Invoice_Date, TC.Invoice_, H.RefPoNum, H.InvoiceNum
From Table1 TC
RIGHT JOIN [SERVERNAME].[DBNAME].[TABLE2] H ON (TC.Invoice_ = H.InvoiceNum)
Where TC.Invoice_Date Between '2018-10-31' AND '2018-10-31'
AND H.Company Like 'COMPANY'
You can do what you want with a full join. Filtering is tricky with a full join, so I recommend subqueries:
select tc.PO_Number, tc.Invoice_Date, tc.Invoice_, h.RefPoNum, h.InvoiceNum
From (select tc.*
from Table1 tc
where tc.Invoice_Date Between '2018-10-31' AND '2018-10-31'
) tc full join
(select h.*
from [SERVERNAME].[DBNAME].[TABLE2] h
where h.Company Like 'COMPANY'
) h
on TC.Invoice_ = H.InvoiceNum;
Just make sure that the column you are comparing has the same data type and you can safely use this query below:
Server01 NOT IN Server02
select t1.InvoiceNumber from server01.dbo.Invoice t1
except
select t2.InvoiceNumber from server02.dbo.Invoice t2
Server02 NOT IN Server01
select t1.InvoiceNumber from server02.dbo.Invoice t1
except
select t2.InvoiceNumber from server01.dbo.Invoice t2
P.S.
While this may not be the exact query you are looking for, but this template may help.

Make a "LEFT UNION" query

I have several databases (nobu and bu) with exact same tables (one is just a back up of the other).
I need to get values from a table from both databases to join them with other tables then I obviously use an UNION. The thing is, some products have different names in the tables from both bu and nobu.
I then tried to select only one database about this table (I used nobu since it's the latest one), but I noticed that some products are not in nobu, but are actually in bu (which makes it not a backup anymore).
The part of the query in which I need this looks like this :
With this I get duplicates
... INNER JOIN (SELECT * FROM nobu.dbo.product UNION SELECT * FROM bu.dbo.product) AS product
ON [...] INNER JOIN (SELECT * FROM nobu.dbo.name UNION SELECT bu.dbo.name) AS name
ON product.key = name.id ...
With this I get some of the products with NULL name since it doesn't exist on nobu
... INNER JOIN (SELECT * FROM nobu.dbo.product UNION SELECT * FROM bu.dbo.product) AS product
ON [...] INNER JOIN (SELECT * FROM nobu.dbo.name) AS name
ON product.key = name.id ...
I wanted to know if there is a way to perform a LEFT UNION or something like that, to get all the values from nobu, and if there is no data, take the ones from bu, without getting the duplicates (since they can have different names on both databases).
If only names have been changed and suggesting that table names is not a big table and will not create performance issues then this code below will do the job:
INNER JOIN (SELECT * FROM nobu.dbo.product UNION SELECT * FROM bu.dbo.product) AS product
ON [...] INNER JOIN (SELECT * FROM nobu.dbo.name UNION SELECT bu.dbo.name WHERE id NOT IN (SELECT id FROM nobu.dbo.name)) AS name
ON product.key = name.id

How to select records which don't exist in another table or have a different status?

I am trying to select records from a temp table based on another temp table which holds their previous statuses (StatusHistory).
So if the record doesn't exist in the status table, then it should be selected. If the status of the record is different than the one in the StatusHistory table, then the record should be selected. Otherwise, if it exists with the same status in the StatusHistory table, then it should be ignored.
I have this SQL but it doesn't seem to be the best solution. Can you please point me to a better way to achieve that assuming that there are thousands of records in the tables? Would it be possible to achieve the same result with a JOIN statement?
SELECT AI.item
FROM #AllItems AI
WHERE NOT EXISTS (
SELECT * FROM #StatusHistory HS
WHERE HS.itemId = AI.itemId
) OR NOT AI.itemStatus IN ( SELECT HS.itemStatusHistory
FROM #StatusHistory HS
WHERE HS.itemId = AI.itemId
AND HS.itemId = AI.itemId )
Yes, you can do this with a LEFT JOIN.
SELECT AI.item
FROM #AllItems AI
LEFT JOIN #StatusHistory HS ON AI.itemId = HS.itemId
AND AI.itemStatus = HS.itemStatusHistory
WHERE HS.itemId IS NULL
A better solution, however, is to use NOT EXISTS:
SELECT AI.item
FROM #AllItems AI
WHERE NOT EXISTS
(
SELECT 1 FROM #StatusHistory SH
WHERE SH.itemId = AI.itemId
AND SH.itemStatusHistory = AI.itemStatus
);
As pointed out by Aaron, this usually performs better than a LEFT JOIN.

Last two joins cause duplicate rows

Ok, so I have a query that is returning more rows than expected with repeating data. Here is my query:
SELECT AP.RECEIPTNUMBER
,AP.FOLDERRSN
,ABS(AP.PAYMENTAMOUNT)
,ABS(AP.PAYMENTAMOUNT - AP.AMOUNTAPPLIED)
,TO_CHAR(AP.PAYMENTDATE,'MM/DD/YYYY')
,F.REFERENCEFILE
,F.FOLDERTYPE
,VS.SUBDESC
,P.NAMEFIRST||' '||P.NAMELAST
,P.ORGANIZATIONNAME
,VAF.FEEDESC
,VAF.GLACCOUNTNUMBER
FROM ACCOUNTPAYMENT AP
INNER JOIN FOLDER F ON AP.FOLDERRSN = F.FOLDERRSN
INNER JOIN VALIDSUB VS ON F.SUBCODE = VS.SUBCODE
INNER JOIN FOLDERPEOPLE FP ON FP.FOLDERRSN = F.FOLDERRSN
INNER JOIN PEOPLE P ON FP.PEOPLERSN = P.PEOPLERSN
INNER JOIN ACCOUNTBILLFEE ABF ON F.FOLDERRSN = ABF.FOLDERRSN
INNER JOIN VALIDACCOUNTFEE VAF ON ABF.FEECODE = VAF.FEECODE
WHERE AP.NSFFLAG = 'Y'
AND F.FOLDERTYPE IN ('405B','405O')
Everything works fine until I add the bottom two Inner Joins. I'm basically trying to get all payments that had NSF. When I run the simple query:
SELECT *
FROM ACCOUNTPAYMENT
WHERE NSFFLAG = 'Y'
I get only 3 rows pertaining to 405B and 405O folders. So I'm only expecting 3 rows to be returned in the above query but I get 9 with information repeating in some columns. I need the exact feedesc and gl account number based on the fee code that can be found in both the Valid Account Fee and Account Bill Fee tables.
I can't post a picture of my output.
Note: when I run the query without the two bottom joins I get the expected output.
Can someone help me make my query more efficient? Thanks!
As requested, below are the results that my query is returning for vaf.feedesc and vaf.glaccountnumber columns:
Boiler Operator License Fee 2423809
Boiler Certificate of Operation without Manway - Revolving 2423813
Installers (Boiler License)/API Exam 2423807
Boiler Public Inspection/Certification (State or Insurance) 2423816
Boiler Certificate of Operation with Manway 2423801
Boiler Certificate of Operation without Manway 2423801
Boiler Certificate of Operation with Manway - Revolving 2423813
BPV Owner/User Program Fee 2423801
Installers (Boiler License)/API Exam Renewal 2423807
The cause is that at least one of the connections ACCOUNTBILLFEE-FOLDER or VALIDACCOUNTFEE-ACCOUNTBILLFEE is not one-to-one. It allows for one Folder to have many AccountBillFees or for one ValidAccountFee to have many AccountBillFees.
To find the cause of such a problem this is what I usually do:
Change the SELECT A, B, C part of your query to SELECT *.
Reduce the results to one of the rows that is causing you trouble (by adding a WHERE ...). That is a single row without your last two joins and a few rows after you add those two joins.
Look at the result table from left to right. The first columns will probably show the same values for all rows. Once you see a difference between the values in a column, you know that the table of the column you are currently looking at is causing your "multiple row problem".
Now create a SELECT * statement that includes only the two tables joined together that cause multiple rows with the same WHERE ... you used above.
The result should give you a clear picture of the cause.
Once you know the reason for your problem you can think of a solution ;)
Try this if it helps then those tables have additional rows which are not relevant. If it doesn't then look at the results of the subqueries I have below to see what additional filters are needed
SELECT AP.RECEIPTNUMBER
,AP.FOLDERRSN
,ABS(AP.PAYMENTAMOUNT)
,ABS(AP.PAYMENTAMOUNT - AP.AMOUNTAPPLIED)
,TO_CHAR(AP.PAYMENTDATE,'MM/DD/YYYY')
,F.REFERENCEFILE
,F.FOLDERTYPE
,VS.SUBDESC
,P.NAMEFIRST||' '||P.NAMELAST
,P.ORGANIZATIONNAME
,VAF.FEEDESC
,VAF.GLACCOUNTNUMBER
FROM ACCOUNTPAYMENT AP
INNER JOIN FOLDER F ON AP.FOLDERRSN = F.FOLDERRSN
INNER JOIN VALIDSUB VS ON F.SUBCODE = VS.SUBCODE
INNER JOIN FOLDERPEOPLE FP ON FP.FOLDERRSN = F.FOLDERRSN
INNER JOIN PEOPLE P ON FP.PEOPLERSN = P.PEOPLERSN
INNER JOIN
(
SELECT DISTINCT ABF.FEECODE, ABF.FOLDERRSN
FROM ACCOUNTBILLFEE ABF
) ABF ON F.FOLDERRSN = ABF.FOLDERRSN
INNER JOIN
(
SELECT DISTINCT VAF.FEEDESC, VAF.GLACCOUNTNUMBER, VAF.FEECODE
FROM VALIDACCOUNTFEE VAF
) VAF ON ABF.FEECODE = VAF.FEECODE
WHERE AP.NSFFLAG = 'Y'
AND F.FOLDERTYPE IN ('405B','405O')
The data for those last two tables is different in different records in the one to many relationship. Since distinct did not fix the problem, then you have to accept that 9 records is the correct return because you are returning the fields that are different or you have to determine which of the multiple records you don't want returned based on business rules that must come from someone in your company not us.
I don't think you fully understand how SQl works as 9 records is exactly what I would have expected given the information you gave in the question. The following are some queries that show how joining in a one to many relationship can affect output and ways that you can adjust the query to get rid of the duplicated output.
Note that in some of the cases, the query cannot be adjusted to get rid of the output because of the columns you want returned. So even if some of the columns are repeated, if even one of the columns you want return has differnt records and you have no approriate business rules for which of them you want to see, you can't reduce the records set. Which rules you need are based on the type of data you are querying and what the rqeuirements are. This is not a question we can answer here, only your company knows whether a min or max value would be acceptable or if you need to add a where clause and if so what field to put it on and what values to use it to exclude. Those are business rules not SQL.
create table #temp (myid int , mydescription varchar(30))
insert into #temp(myid, mydescription)
values (1, 'test') , (2, 'test2')
create table #temp2 (myid int, myotherdescription varchar(30))
insert into #temp2(myid, myotherdescription)
values (1, 'othertest') , (1, 'othertest2'), (2, 'myothertest') , (1, 'othertest3')
select *
from #temp t
join #temp2 t2 on t.myid = t2.myid
select t2.myid, t.mydescription
from #temp t
join #temp2 t2 on t.myid = t2.myid
select distinct t2.myid, t.mydescription
from #temp t
join #temp2 t2 on t.myid = t2.myid
select t.myid, t.mydescription, t2.myotherdescription
from #temp t
join #temp2 t2 on t.myid = t2.myid
select distinct t.myid, t.mydescription, t2.myotherdescription
from #temp t
join #temp2 t2 on t.myid = t2.myid
select t.myid, min(t2.myotherdescription)
from #temp t
join #temp2 t2 on t.myid = t2.myid
group by t.myid
select t.myid, t2.myotherdescription
from #temp t
join #temp2 t2 on t.myid = t2.myid
where t2.myid = 2

Filter a SQL Server table dynamically using multiple joins

I am trying to filter a single table (master) by the values in multiple other tables (filter1, filter2, filter3 ... filterN) using only joins.
I want the following rules to apply:
(A) If one or more rows exist in a filter table, then include only those rows from the master that match the values in the filter table.
(B) If no rows exist in a filter table, then ignore it and return all the rows from the master table.
(C) This solution should work for N filter tables in combination.
(D) Static SQL using JOIN syntax only, no Dynamic SQL.
I'm really trying to get rid of dynamic SQL wherever possible, and this is one of those places I truly think it's possible, but just can't quite figure it out. Note: I have solved this using Dynamic SQL already, and it was fairly easy, but not particularly efficient or elegant.
What I have tried:
Various INNER JOINS between master and filter tables - works for (A) but fails on (B) because the join removes all records from the master (left) side when the filter (right) side has no rows.
LEFT JOINS - Always returns all records from the master (left) side. This fails (A) when some filter tables have records and some do not.
What I really need:
It seems like what I need is to be able to INNER JOIN on each filter table that has 1 or more rows and LEFT JOIN (or not JOIN at all) on each filter table that is empty.
My question: How would I accomplish this without resorting to Dynamic SQL?
In SQL Server 2005+ you could try this:
WITH
filter1 AS (
SELECT DISTINCT
m.ID,
HasMatched = CASE WHEN f.ID IS NULL THEN 0 ELSE 1 END,
AllHasMatched = MAX(CASE WHEN f.ID IS NULL THEN 0 ELSE 1 END) OVER ()
FROM masterdata m
LEFT JOIN filtertable1 f ON join_condition
),
filter2 AS (
SELECT DISTINCT
m.ID,
HasMatched = CASE WHEN f.ID IS NULL THEN 0 ELSE 1 END,
AllHasMatched = MAX(CASE WHEN f.ID IS NULL THEN 0 ELSE 1 END) OVER ()
FROM masterdata m
LEFT JOIN filtertable2 f ON join_condition
),
…
SELECT m.*
FROM masterdata m
INNER JOIN filter1 f1 ON m.ID = f1.ID AND f1.HasMatched = f1.AllHasMatched
INNER JOIN filter2 f2 ON m.ID = f2.ID AND f2.HasMatched = f2.AllHasMatched
…
My understanding is, filter tables without any matches simply must not affect the resulting set. The output should only consist of those masterdata rows that have matched all the filters where matches have taken place.
SELECT *
FROM master_table mt
WHERE (0 = (select count(*) from filter_table_1)
OR mt.id IN (select id from filter_table_1)
AND (0 = (select count(*) from filter_table_2)
OR mt.id IN (select id from filter_table_2)
AND (0 = (select count(*) from filter_table_3)
OR mt.id IN (select id from filter_table_3)
Be warned that this could be inefficient in practice. Unless you have a specific reason to kill your existing, working, solution, I would keep it.
Do inner join to get results for (A) only and do left join to get results for (B) only (you will have to put something like this in the where clause: filterN.column is null) combine results from inner join and left join with UNION.
Left Outer Join - gives you the MISSING entries in master table ....
SELECT * FROM MASTER M
INNER JOIN APPRENTICE A ON A.PK = M.PK
LEFT OUTER JOIN FOREIGN F ON F.FK = M.PK
If FOREIGN has keys that is not a part of MASTER you will have "null columns" where the slots are missing
I think that is what you looking for ...
Mike
First off, it is impossible to have "N number of Joins" or "N number of filters" without resorting to dynamic SQL. The SQL language was not designed for dynamic determination of the entities against which you are querying.
Second, one way to accomplish what you want (but would be built dynamically) would be something along the lines of:
Select ...
From master
Where Exists (
Select 1
From filter_1
Where filter_1 = master.col1
Union All
Select 1
From ( Select 1 )
Where Not Exists (
Select 1
From filter_1
)
Intersect
Select 1
From filter_2
Where filter_2 = master.col2
Union All
Select 1
From ( Select 1 )
Where Not Exists (
Select 1
From filter_2
)
...
Intersect
Select 1
From filter_N
Where filter_N = master.colN
Union All
Select 1
From ( Select 1 )
Where Not Exists (
Select 1
From filter_N
)
)
I have previously posted a - now deleted - answer based on wrong assumptions on you problems.
But I think you could go for a solution where you split your initial search problem into a matter of constructing the set of ids from the master table, and then select the data joining on that set of ids. Here I naturally assume you have a kind of ID on your master table. The filter tables contains the filter values only. This could then be combined into the statement below, where each SELECT in the eligble subset provides a set of master ids, these are unioned to avoid duplicates and that set of ids are joined to the table with data.
SELECT * FROM tblData INNER JOIN
(
SELECT id FROM tblData td
INNER JOIN fa on fa.a = td.a
UNION
SELECT id FROM tblData td
INNER JOIN fb on fb.b = td.b
UNION
SELECT id FROM tblData td
INNER JOIN fc on fc.c = td.c
) eligible ON eligible.id = tblData.id
The test has been made against the tables and values shown below. These are just an appendix.
CREATE TABLE tblData (id int not null primary key identity(1,1), a varchar(40), b datetime, c int)
CREATE TABLE fa (a varchar(40) not null primary key)
CREATE TABLE fb (b datetime not null primary key)
CREATE TABLE fc (c int not null primary key)
Since you have filter tables, I am assuming that these tables are probably dynamically populated from a front-end. This would mean that you have these tables as #temp_table (or even a materialized table, doesn't matter really) in your script before filtering on the master data table.
Personally, I use the below code bit for filtering dynamically without using dynamic SQL.
SELECT *
FROM [masterdata] [m]
INNER JOIN
[filter_table_1] [f1]
ON
[m].[filter_column_1] = ISNULL(NULLIF([f1].[filter_column_1], ''), [m].[filter_column_1])
As you can see, the code NULLs the JOIN condition if the column value is a blank record in the filter table. However, the gist in this is that you will have to actively populate the column value to blank in case you do not have any filter records on which you want to curtail the total set of the master data. Once you have populated the filter table with a blank, the JOIN condition NULLs in those cases and instead joins on itself with the same column from the master data table. This should work for all the cases you mentioned in your question.
I have found this bit of code to be faster in terms of performance.
Hope this helps. Please let me know in the comments.