I've been asked to basically write a report that displays data in two different databases and be able to see in either database if something is missing.
IE, the invoice number may exist in database1, but not in database2 and vice versa.
I've got the following query below but it only returns all the data from the second table, with NULL values for the first. I'd like to set it up to return the NULL Values in both, but I think the problem is because my join is on the values that can be NULL, so it won't return the values that exist in the first table and not the second.
Can someone step me through how to resolve an issue like this?
As far as I'm aware, I don't necessarily have any other tables to join unless I try to join more tables from each database.
Query:
Select TC.PO_Number, TC.Invoice_Date, TC.Invoice_, H.RefPoNum, H.InvoiceNum
From Table1 TC
RIGHT JOIN [SERVERNAME].[DBNAME].[TABLE2] H ON (TC.Invoice_ = H.InvoiceNum)
Where TC.Invoice_Date Between '2018-10-31' AND '2018-10-31'
AND H.Company Like 'COMPANY'
You can do what you want with a full join. Filtering is tricky with a full join, so I recommend subqueries:
select tc.PO_Number, tc.Invoice_Date, tc.Invoice_, h.RefPoNum, h.InvoiceNum
From (select tc.*
from Table1 tc
where tc.Invoice_Date Between '2018-10-31' AND '2018-10-31'
) tc full join
(select h.*
from [SERVERNAME].[DBNAME].[TABLE2] h
where h.Company Like 'COMPANY'
) h
on TC.Invoice_ = H.InvoiceNum;
Just make sure that the column you are comparing has the same data type and you can safely use this query below:
Server01 NOT IN Server02
select t1.InvoiceNumber from server01.dbo.Invoice t1
except
select t2.InvoiceNumber from server02.dbo.Invoice t2
Server02 NOT IN Server01
select t1.InvoiceNumber from server02.dbo.Invoice t1
except
select t2.InvoiceNumber from server01.dbo.Invoice t2
P.S.
While this may not be the exact query you are looking for, but this template may help.
Related
Ok, so I have a query that is returning more rows than expected with repeating data. Here is my query:
SELECT AP.RECEIPTNUMBER
,AP.FOLDERRSN
,ABS(AP.PAYMENTAMOUNT)
,ABS(AP.PAYMENTAMOUNT - AP.AMOUNTAPPLIED)
,TO_CHAR(AP.PAYMENTDATE,'MM/DD/YYYY')
,F.REFERENCEFILE
,F.FOLDERTYPE
,VS.SUBDESC
,P.NAMEFIRST||' '||P.NAMELAST
,P.ORGANIZATIONNAME
,VAF.FEEDESC
,VAF.GLACCOUNTNUMBER
FROM ACCOUNTPAYMENT AP
INNER JOIN FOLDER F ON AP.FOLDERRSN = F.FOLDERRSN
INNER JOIN VALIDSUB VS ON F.SUBCODE = VS.SUBCODE
INNER JOIN FOLDERPEOPLE FP ON FP.FOLDERRSN = F.FOLDERRSN
INNER JOIN PEOPLE P ON FP.PEOPLERSN = P.PEOPLERSN
INNER JOIN ACCOUNTBILLFEE ABF ON F.FOLDERRSN = ABF.FOLDERRSN
INNER JOIN VALIDACCOUNTFEE VAF ON ABF.FEECODE = VAF.FEECODE
WHERE AP.NSFFLAG = 'Y'
AND F.FOLDERTYPE IN ('405B','405O')
Everything works fine until I add the bottom two Inner Joins. I'm basically trying to get all payments that had NSF. When I run the simple query:
SELECT *
FROM ACCOUNTPAYMENT
WHERE NSFFLAG = 'Y'
I get only 3 rows pertaining to 405B and 405O folders. So I'm only expecting 3 rows to be returned in the above query but I get 9 with information repeating in some columns. I need the exact feedesc and gl account number based on the fee code that can be found in both the Valid Account Fee and Account Bill Fee tables.
I can't post a picture of my output.
Note: when I run the query without the two bottom joins I get the expected output.
Can someone help me make my query more efficient? Thanks!
As requested, below are the results that my query is returning for vaf.feedesc and vaf.glaccountnumber columns:
Boiler Operator License Fee 2423809
Boiler Certificate of Operation without Manway - Revolving 2423813
Installers (Boiler License)/API Exam 2423807
Boiler Public Inspection/Certification (State or Insurance) 2423816
Boiler Certificate of Operation with Manway 2423801
Boiler Certificate of Operation without Manway 2423801
Boiler Certificate of Operation with Manway - Revolving 2423813
BPV Owner/User Program Fee 2423801
Installers (Boiler License)/API Exam Renewal 2423807
The cause is that at least one of the connections ACCOUNTBILLFEE-FOLDER or VALIDACCOUNTFEE-ACCOUNTBILLFEE is not one-to-one. It allows for one Folder to have many AccountBillFees or for one ValidAccountFee to have many AccountBillFees.
To find the cause of such a problem this is what I usually do:
Change the SELECT A, B, C part of your query to SELECT *.
Reduce the results to one of the rows that is causing you trouble (by adding a WHERE ...). That is a single row without your last two joins and a few rows after you add those two joins.
Look at the result table from left to right. The first columns will probably show the same values for all rows. Once you see a difference between the values in a column, you know that the table of the column you are currently looking at is causing your "multiple row problem".
Now create a SELECT * statement that includes only the two tables joined together that cause multiple rows with the same WHERE ... you used above.
The result should give you a clear picture of the cause.
Once you know the reason for your problem you can think of a solution ;)
Try this if it helps then those tables have additional rows which are not relevant. If it doesn't then look at the results of the subqueries I have below to see what additional filters are needed
SELECT AP.RECEIPTNUMBER
,AP.FOLDERRSN
,ABS(AP.PAYMENTAMOUNT)
,ABS(AP.PAYMENTAMOUNT - AP.AMOUNTAPPLIED)
,TO_CHAR(AP.PAYMENTDATE,'MM/DD/YYYY')
,F.REFERENCEFILE
,F.FOLDERTYPE
,VS.SUBDESC
,P.NAMEFIRST||' '||P.NAMELAST
,P.ORGANIZATIONNAME
,VAF.FEEDESC
,VAF.GLACCOUNTNUMBER
FROM ACCOUNTPAYMENT AP
INNER JOIN FOLDER F ON AP.FOLDERRSN = F.FOLDERRSN
INNER JOIN VALIDSUB VS ON F.SUBCODE = VS.SUBCODE
INNER JOIN FOLDERPEOPLE FP ON FP.FOLDERRSN = F.FOLDERRSN
INNER JOIN PEOPLE P ON FP.PEOPLERSN = P.PEOPLERSN
INNER JOIN
(
SELECT DISTINCT ABF.FEECODE, ABF.FOLDERRSN
FROM ACCOUNTBILLFEE ABF
) ABF ON F.FOLDERRSN = ABF.FOLDERRSN
INNER JOIN
(
SELECT DISTINCT VAF.FEEDESC, VAF.GLACCOUNTNUMBER, VAF.FEECODE
FROM VALIDACCOUNTFEE VAF
) VAF ON ABF.FEECODE = VAF.FEECODE
WHERE AP.NSFFLAG = 'Y'
AND F.FOLDERTYPE IN ('405B','405O')
The data for those last two tables is different in different records in the one to many relationship. Since distinct did not fix the problem, then you have to accept that 9 records is the correct return because you are returning the fields that are different or you have to determine which of the multiple records you don't want returned based on business rules that must come from someone in your company not us.
I don't think you fully understand how SQl works as 9 records is exactly what I would have expected given the information you gave in the question. The following are some queries that show how joining in a one to many relationship can affect output and ways that you can adjust the query to get rid of the duplicated output.
Note that in some of the cases, the query cannot be adjusted to get rid of the output because of the columns you want returned. So even if some of the columns are repeated, if even one of the columns you want return has differnt records and you have no approriate business rules for which of them you want to see, you can't reduce the records set. Which rules you need are based on the type of data you are querying and what the rqeuirements are. This is not a question we can answer here, only your company knows whether a min or max value would be acceptable or if you need to add a where clause and if so what field to put it on and what values to use it to exclude. Those are business rules not SQL.
create table #temp (myid int , mydescription varchar(30))
insert into #temp(myid, mydescription)
values (1, 'test') , (2, 'test2')
create table #temp2 (myid int, myotherdescription varchar(30))
insert into #temp2(myid, myotherdescription)
values (1, 'othertest') , (1, 'othertest2'), (2, 'myothertest') , (1, 'othertest3')
select *
from #temp t
join #temp2 t2 on t.myid = t2.myid
select t2.myid, t.mydescription
from #temp t
join #temp2 t2 on t.myid = t2.myid
select distinct t2.myid, t.mydescription
from #temp t
join #temp2 t2 on t.myid = t2.myid
select t.myid, t.mydescription, t2.myotherdescription
from #temp t
join #temp2 t2 on t.myid = t2.myid
select distinct t.myid, t.mydescription, t2.myotherdescription
from #temp t
join #temp2 t2 on t.myid = t2.myid
select t.myid, min(t2.myotherdescription)
from #temp t
join #temp2 t2 on t.myid = t2.myid
group by t.myid
select t.myid, t2.myotherdescription
from #temp t
join #temp2 t2 on t.myid = t2.myid
where t2.myid = 2
I have a query that I am trying to understand. Can someone shed light on to the details of what this query does?
I've only ever used one ON clause in a join condition. This one has multiple conditions for the LEFT JOIN, making it tricky to understand.
INSERT INTO nop_tbl
(q_date, community_id, newsletter_t, subscription_count)
SELECT date(now()), a.community_id,
a.newsletter_type,
count(a.subscriber_user_id)
FROM
newsletter_subscribers_main a
LEFT OUTER JOIN nop_tbl b
ON (a.community_id = b.community_id)
AND (a.newsletter_type = b.newsletter_t)
AND (a.created_at = b.q_date)
WHERE b.q_date is null
AND b.mailing_list is null
GROUP BY a.community_id, a.newsletter_t, a.created_at
You have your explanation:
The objective of the query is to count subscriptions per (q_date, community_id, newsletter_t) in newsletter_subscribers_main and write the result to nop_tbl.
The LEFT JOIN prevents that rows are added multiple times.
But I also think, the query is inefficient and probably wrong.
The 2nd WHERE clause:
AND b.mailing_list is null
is just noise and can be removed. If b.q_date is null, then b.mailing_list is guaranteed to be null in this query.
You don't need parentheses around JOIN conditions.
If subscriber_user_id is defined NOT NULL, count(*) does the same, cheaper.
I suspect that grouping by a.created_at, while you insert date(now()) is probably wrong. Hardly makes any sense. My educated guess (assuming that created_at is type date):
INSERT INTO nop_tbl
(q_date, community_id, newsletter_t, subscription_count)
SELECT a.created_at
,a.community_id
,a.newsletter_type
,count(*)
FROM newsletter_subscribers_main a
LEFT JOIN nop_tbl b ON a.community_id = b.community_id
AND a.newsletter_type = b.newsletter_t
AND a.created_at = b.q_date
WHERE b.q_date IS NULL
GROUP BY a.created_at, a.community_id, a.newsletter_t;
The short short version is:
insert ... select ...
-> the query is filling nob_tbl
from ...
-> based on data in newsletter_subscribers_main
left join ... where ... is null
-> that are not already present in nob_tbl
Step by step
INSERT INTO nop_tbl
(q_date, community_id, newsletter_t, subscription_count)
The INSERT syntax This is telling the database in what table and what column will be used for the insert query
SELECT date(now()), a.community_id,
a.newsletter_type,
count(a.subscriber_user_id)
Those are instead the selected fields to insert
FROM
newsletter_subscribers_main a
Here is telling to the database to select fields which has alias prepended a. from table newsletter_subscribers_main
LEFT OUTER JOIN nop_tbl b
Here is left joining another table nop_tbl where other fields will be selected
ON (a.community_id = b.community_id)
AND (a.newsletter_type = b.newsletter_t)
AND (a.created_at = b.q_date)
Those are the rules of the JOIN, actually is telling what columns will be used for join
WHERE b.q_date is null
AND b.mailing_list is null
Those are the WHERE clauses , they are used to limit result to the requested data, in this case where two columns are null
GROUP BY a.community_id, a.newsletter_t, a.created_at
GROUP BY clauses, used to group result on given column
You can have a visual explanation of joins here
I have two tables
Table A
type_uid, allowed_type_uid
9,1
9,2
9,4
1,1
1,2
24,1
25,3
Table B
type_uid
1
2
From table A I need to return
9
1
Using a WHERE IN clause I can return
9
1
24
SELECT
TableA.type_uid
FROM
TableA
INNER JOIN
TableB
ON TableA.allowed_type_uid = TableB.type_uid
GROUP BY
TableA.type_uid
HAVING
COUNT(distinct TableB.type_uid) = (SELECT COUNT(distinct type_uid) FROM TableB)
Join the two tables togeter, so that you only have the records matching the types you are interested in.
Group the result set by TableA.type_uid.
Check that each group has the same number of allowed_type_uid values as exist in TableB.type_uid.
distinct is required only if there can be duplicate records in either table. If both tables are know to only have unique values, the distinct can be removed.
It should also be noted that as TableA grows in size, this type of query will quickly degrade in performance. This is because indexes are not actually much help here.
It can still be a useful structure, but not one where I'd recommend running the queries in real-time. Rather use it to create another persisted/cached result set, and use this only to refresh those results as/when needed.
Or a slightly cheaper version (resource wise):
SELECT
Data.type_uid
FROM
A AS Data
CROSS JOIN
B
LEFT JOIN
A
ON Data.type_uid = A.type_uid AND B.type_uid = A.allowed_type_uid
GROUP BY
Data.type_uid
HAVING
MIN(ISNULL(A.allowed_type_uid,-999)) != -999
Your explanation is not very clear. I think you want to get those type_uid's from table A where for all records in table B there is a matching A.Allowed_type_uid.
SELECT T2.type_uid
FROM (SELECT COUNT(*) as AllAllowedTypes FROM #B) as T1,
(SELECT #A.type_uid, COUNT(*) as AllowedTypes
FROM #A
INNER JOIN #B ON
#A.allowed_type_uid = #B.type_uid
GROUP BY #A.type_uid
) as T2
WHERE T1.AllAllowedTypes = T2.AllowedTypes
(Dems, you were faster than me :) )
I am trying to filter a single table (master) by the values in multiple other tables (filter1, filter2, filter3 ... filterN) using only joins.
I want the following rules to apply:
(A) If one or more rows exist in a filter table, then include only those rows from the master that match the values in the filter table.
(B) If no rows exist in a filter table, then ignore it and return all the rows from the master table.
(C) This solution should work for N filter tables in combination.
(D) Static SQL using JOIN syntax only, no Dynamic SQL.
I'm really trying to get rid of dynamic SQL wherever possible, and this is one of those places I truly think it's possible, but just can't quite figure it out. Note: I have solved this using Dynamic SQL already, and it was fairly easy, but not particularly efficient or elegant.
What I have tried:
Various INNER JOINS between master and filter tables - works for (A) but fails on (B) because the join removes all records from the master (left) side when the filter (right) side has no rows.
LEFT JOINS - Always returns all records from the master (left) side. This fails (A) when some filter tables have records and some do not.
What I really need:
It seems like what I need is to be able to INNER JOIN on each filter table that has 1 or more rows and LEFT JOIN (or not JOIN at all) on each filter table that is empty.
My question: How would I accomplish this without resorting to Dynamic SQL?
In SQL Server 2005+ you could try this:
WITH
filter1 AS (
SELECT DISTINCT
m.ID,
HasMatched = CASE WHEN f.ID IS NULL THEN 0 ELSE 1 END,
AllHasMatched = MAX(CASE WHEN f.ID IS NULL THEN 0 ELSE 1 END) OVER ()
FROM masterdata m
LEFT JOIN filtertable1 f ON join_condition
),
filter2 AS (
SELECT DISTINCT
m.ID,
HasMatched = CASE WHEN f.ID IS NULL THEN 0 ELSE 1 END,
AllHasMatched = MAX(CASE WHEN f.ID IS NULL THEN 0 ELSE 1 END) OVER ()
FROM masterdata m
LEFT JOIN filtertable2 f ON join_condition
),
…
SELECT m.*
FROM masterdata m
INNER JOIN filter1 f1 ON m.ID = f1.ID AND f1.HasMatched = f1.AllHasMatched
INNER JOIN filter2 f2 ON m.ID = f2.ID AND f2.HasMatched = f2.AllHasMatched
…
My understanding is, filter tables without any matches simply must not affect the resulting set. The output should only consist of those masterdata rows that have matched all the filters where matches have taken place.
SELECT *
FROM master_table mt
WHERE (0 = (select count(*) from filter_table_1)
OR mt.id IN (select id from filter_table_1)
AND (0 = (select count(*) from filter_table_2)
OR mt.id IN (select id from filter_table_2)
AND (0 = (select count(*) from filter_table_3)
OR mt.id IN (select id from filter_table_3)
Be warned that this could be inefficient in practice. Unless you have a specific reason to kill your existing, working, solution, I would keep it.
Do inner join to get results for (A) only and do left join to get results for (B) only (you will have to put something like this in the where clause: filterN.column is null) combine results from inner join and left join with UNION.
Left Outer Join - gives you the MISSING entries in master table ....
SELECT * FROM MASTER M
INNER JOIN APPRENTICE A ON A.PK = M.PK
LEFT OUTER JOIN FOREIGN F ON F.FK = M.PK
If FOREIGN has keys that is not a part of MASTER you will have "null columns" where the slots are missing
I think that is what you looking for ...
Mike
First off, it is impossible to have "N number of Joins" or "N number of filters" without resorting to dynamic SQL. The SQL language was not designed for dynamic determination of the entities against which you are querying.
Second, one way to accomplish what you want (but would be built dynamically) would be something along the lines of:
Select ...
From master
Where Exists (
Select 1
From filter_1
Where filter_1 = master.col1
Union All
Select 1
From ( Select 1 )
Where Not Exists (
Select 1
From filter_1
)
Intersect
Select 1
From filter_2
Where filter_2 = master.col2
Union All
Select 1
From ( Select 1 )
Where Not Exists (
Select 1
From filter_2
)
...
Intersect
Select 1
From filter_N
Where filter_N = master.colN
Union All
Select 1
From ( Select 1 )
Where Not Exists (
Select 1
From filter_N
)
)
I have previously posted a - now deleted - answer based on wrong assumptions on you problems.
But I think you could go for a solution where you split your initial search problem into a matter of constructing the set of ids from the master table, and then select the data joining on that set of ids. Here I naturally assume you have a kind of ID on your master table. The filter tables contains the filter values only. This could then be combined into the statement below, where each SELECT in the eligble subset provides a set of master ids, these are unioned to avoid duplicates and that set of ids are joined to the table with data.
SELECT * FROM tblData INNER JOIN
(
SELECT id FROM tblData td
INNER JOIN fa on fa.a = td.a
UNION
SELECT id FROM tblData td
INNER JOIN fb on fb.b = td.b
UNION
SELECT id FROM tblData td
INNER JOIN fc on fc.c = td.c
) eligible ON eligible.id = tblData.id
The test has been made against the tables and values shown below. These are just an appendix.
CREATE TABLE tblData (id int not null primary key identity(1,1), a varchar(40), b datetime, c int)
CREATE TABLE fa (a varchar(40) not null primary key)
CREATE TABLE fb (b datetime not null primary key)
CREATE TABLE fc (c int not null primary key)
Since you have filter tables, I am assuming that these tables are probably dynamically populated from a front-end. This would mean that you have these tables as #temp_table (or even a materialized table, doesn't matter really) in your script before filtering on the master data table.
Personally, I use the below code bit for filtering dynamically without using dynamic SQL.
SELECT *
FROM [masterdata] [m]
INNER JOIN
[filter_table_1] [f1]
ON
[m].[filter_column_1] = ISNULL(NULLIF([f1].[filter_column_1], ''), [m].[filter_column_1])
As you can see, the code NULLs the JOIN condition if the column value is a blank record in the filter table. However, the gist in this is that you will have to actively populate the column value to blank in case you do not have any filter records on which you want to curtail the total set of the master data. Once you have populated the filter table with a blank, the JOIN condition NULLs in those cases and instead joins on itself with the same column from the master data table. This should work for all the cases you mentioned in your question.
I have found this bit of code to be faster in terms of performance.
Hope this helps. Please let me know in the comments.
I am getting the following error when trying to run this query in SQL 2005:
SELECT tb.*
FROM (
SELECT *
FROM vCodesWithPEs INNER JOIN vDeriveAvailabilityFromPE
ON vCodesWithPEs.PROD_PERM = vDeriveAvailabilityFromPE.PEID
INNER JOIN PE_PDP ON vCodesWithPEs.PROD_PERM = PE_PDP.PEID
) AS tb;
Error: The column 'PEID' was specified multiple times for 'tb'.
I am new to SQL.
The problem, as mentioned, is that you are selecting PEID from two tables, the solution is to specify which PEID do you want, for example
SELECT tb.*
FROM (
SELECT tb1.PEID,tb2.col1,tb2.col2,tb3.col3 --, and so on
FROM vCodesWithPEs as tb1 INNER JOIN vDeriveAvailabilityFromPE as tb2
ON tb1.PROD_PERM = tb2.PEID
INNER JOIN PE_PDP tb3 ON tb1.PROD_PERM = tb3.PEID
) AS tb;
That aside, as Chris Lively cleverly points out in a comment the outer SELECT is totally superfluous. The following is totally equivalent to the first.
SELECT tb1.PEID,tb2.col1,tb2.col2,tb3.col3 --, and so on
FROM vCodesWithPEs as tb1 INNER JOIN vDeriveAvailabilityFromPE as tb2
ON tb1.PROD_PERM = tb2.PEID
INNER JOIN PE_PDP tb3 ON tb1.PROD_PERM = tb3.PEID
or even
SELECT *
FROM vCodesWithPEs as tb1 INNER JOIN vDeriveAvailabilityFromPE as tb2
ON tb1.PROD_PERM = tb2.PEID
INNER JOIN PE_PDP tb3 ON tb1.PROD_PERM = tb3.PEID
but please avoid using SELECT * whenever possible. It may work while you are doing interactive queries to save typing, but in production code never use it.
Looks like you have the column PEID in both tables: vDeriveAvailabilityFromPE and PE_PDP. The SELECT statement tries to select both, and gives an error about duplicate column name.
You're joining three tables, and looking at all columns in the output (*).
It looks like the tables have a common column name PEID, which you're going to have to alias as something else.
Solution: don't use * in the subquery, but explicitly select each column you wish to see, aliasing any column name that appears more than once.
Instead of using * to identify collecting all of the fields, rewrite your query to explicitly name the columns you want. That way there will be no confusion.
just give new alias name for the column that repeats,it worked for me.....