Microsoft SQL Server: Need to Compare Two Sets of Results

Microsoft SQL Server: Need to Compare Two Sets of Results - sql

I have two queries that each give me list of orders and the number of line items on each order. We are migrating systems, and one is from the source, and the other from the target. I need to do some validation between them.
I want to compare them so that my result is a list of orders where the number of lines do not match. If an order should be in the target but isn't, it should also appear on the list.
I have been fighting with this for a full day and I cannot wrap my head around how to approach it.
Help would be appreciated!
Here are the two queries:
SOURCE QUERY
select s.oNum, count(s.oNum)
from SourceTbl s
left join PK_Master pk
on pk.Num = s.oNum
where s.oNum not in (select ordernum from tmpSalesOrders)
group by s.oNum
order by s.oNum
TARGET QUERY
select p.oNum, count(p.oNum)
from BridgeTbl p
left join TargetTbl t
on p.ToNum = t.orderID
group by p.oNum
order by p.oNum
SourceTable is a superset, and PK_Master and tmpSalesOrders are used to refine the orders that SHOULD be migrated.
The BridgeTbl has a field called SoNum that = s.oNum to link the source and target.
I need the source order number (s.oNum) in the result set.

Toss EXCEPT between those two queries and let 'er rip. That will tell you records that are outputted by your first query that aren't in your second query.
select s.oNum, count(s.oNum)
from SourceTbl s
left join PK_Master pk
on pk.Num = s.oNum
where s.oNum not in (select ordernum from tmpSalesOrders)
group by s.oNum
order by s.oNum
EXCEPT
select p.oNum, count(p.oNum)
from BridgeTbl p
left join TargetTbl t
on p.ToNum = t.orderID
group by p.oNum
order by p.oNum
EXCEPT is a "Set Operator". In TSQL you can UNION, UNION ALL, EXCEPT, and INTERSECT sets.

Related

Query with Left outer join and group by returning duplicates

To begin with, I have a table in my db that is fed with SalesForce info. When I run this example query it returns 2 rows:
select * from SalesForce_INT_Account__c where ID_SAP_BAYER__c = '3783513'
When I run this next query on the same table I obtain one of the rows, which is what I need:
SELECT MAX(ID_SAP_BAYER__c) FROM SalesForce_INT_Account__c where ID_SAP_BAYER__c = '3783513' GROUP BY ID_SAP_BAYER__c
Now, I have another table (PedidosEspecialesZarateCabeceras) which has a field (NroClienteDireccionEntrega) that I can match with the field I've been using in the SalesForce table (ID_SAP_BAYER__c). This table has a key that consists of just 1 field (NroPedido).
What I need to do is join these 2 tables to obtain a row from PedidosEspecialesZarateCabeceras with additional fields coming from the SalesForce table, and in case those additional fields are not available, they should come as NULL values, so for that im using a LEFT OUTER JOIN.
The problem is, since I have to match NroClienteDireccionEntrega and ID_SAP_BAYER__c and there's 2 rows in the salesforce table with the same ID_SAP_BAYER__c, my query returns 2 duplicate rows from PedidosEspecialesZarateCabeceras (They both have the same NroPedido).
This is an example query that returns duplicates:
SELECT
cab.CUIT AS CUIT,
convert(nvarchar(4000), cab.NroPedido) AS NroPedido,
sales.BillingCity__c as Localidad,
sales.BillingState__c as IdProvincia,
sales.BillingState__c_Desc as Provincia,
sales.BillingStreet__c as Calle,
sales.Billing_Department__c as Distrito,
sales.Name as RazonSocial,
cab.NroCliente as ClienteId
FROM PedidosEspecialesZarateCabeceras AS cab WITH (NOLOCK)
LEFT OUTER JOIN
SalesForce_INT_Account__c AS sales WITH (NOLOCK) ON
cab.NroClienteDireccionEntrega = sales.ID_SAP_BAYER__c
and sales.ID_SAP_BAYER__c in
( SELECT MAX(ID_SAP_BAYER__c)
FROM SalesForce_INT_Account__c
GROUP BY ID_SAP_BAYER__c
)
WHERE cab.NroPedido ='5320'
Even though the join has MAX and Group By, this returns 2 duplicate rows with different SalesForce information (Because of the 2 salesforce rows with the same ID_SAP_BAYER__c), which should not be possible.
What I need is for the left outer join in my query to pick only ONE of the salesforce rows to prevent duplication like its happening right now. For some reason the select max with the group by is not working.
Maybe I should try to join this tables in a different way, can anyone give me some other ideas on how to join the two tables to return just 1 row? It doesnt matter if the SalesForce row that gets picked out of the 2 isn't the correct one, I just need it to pick one of them.

Your IN clause is not actually doing anything, since...
SELECT MAX(ID_SAP_BAYER__c)
FROM SalesForce_INT_Account__c
GROUP BY ID_SAP_BAYER__c
... returns all possible IDSAP_BAYER__c values. (The GROUP BY says you want to return one row per unique ID_SAP_BAYER__c and then, since your MAX is operating on exactly one unique value per group, you simply return that value.)
You will want to change your query to operate on a value that is actually different between the two rows you are trying to differentiate (probably the MAX(ID) for the relevant ID_SAP_BAYER__c). Plus, you will want to link that inner query to your outer query.
You could probably do something like:
...
LEFT OUTER JOIN
SalesForce_INT_Account__c sales
ON cab.NroClienteDireccionEntrega = sales.ID_SAP_BAYER__c
and sales.ID in
(
SELECT MAX(ID)
FROM SalesForce_INT_Account__c sales2
WHERE sales2.ID_SAP_BAYER__c = cab.NroClienteDireccionEntrega
)
WHERE cab.NroPedido ='5320'
By using sales.ID in ... SELECT MAX(ID) ... instead of sales.ID_SAP_BAYER__c in ... SELECT MAX(ID_SAP_BAYER__c) ... this ensures you only match one of the two rows for that ID_SAP_BAYER__c. The WHERE sales2.ID_SAP_BAYER__c = cab.NroClienteDireccionEntrega condition links the inner query to the outer query.
There are multiple ways of doing the above, especially if you don't care which of the relevant rows you match on. You can use the above as a starting point and make it match your preferred style.
An alternative might be to use OUTER APPLY with TOP 1. Something like:
SELECT
...
FROM PedidosEspecialesZarateCabeceras AS cab
OUTER APPLY(
SELECT TOP 1 *
FROM SalesForce_INT_Account__c s1
WHERE cab.NroClienteDireccionEntrega = s1.ID_SAP_BAYER__c
) sales
WHERE cab.NroPedido ='5320'
Without an ORDER BY the match that TOP 1 chooses will be arbitrary, but I think that's what you want anyway. (If not, you could add an ORDER BY).

How to improve code to include columns from another table?

I'm trying to do a join statement for an inventory report of sorts but I am not sure what I am missing.
So I tried doing the reverse of my join statement and some columns remain omitted. I'm just not sure what I should add or change in the code.
My tables look something like this:
PRODUCT_TBL: productID|productDescription|stockQuantity
PRODUCT_SUPPLIER_TBL: supplierID|productID|vendorPartID|productCost|purchased Quantity
select PRODUCT_SUPPLIER_TBL.productID,PRODUCT_SUPPLIER_TBL.vendorPartID,PRODUCT_SUPPLIER_TBL.productCost
from PRODUCT_SUPPLIER_TBL
inner join PRODUCT_TBL on PRODUCT_SUPPLIER_TBL.productID = PRODUCT_TBL.productID
order by productCost desc
I expected one other column aside from productID to appear in the results but what I got only has information from the product_supplier_tbl and the productID from product_supplier_tbl and product_tbl.

You include any columns - from any of the tables you have joined together - in the SELECT list to return it in your query results.
select PRODUCT_SUPPLIER_TBL.productID
,PRODUCT_SUPPLIER_TBL.vendorPartID
,PRODUCT_SUPPLIER_TBL.productCost
,PRODUCT_TBL.product_description --adding a column to the SELECT list
from PRODUCT_SUPPLIER_TBL
inner join PRODUCT_TBL on PRODUCT_SUPPLIER_TBL.productID = PRODUCT_TBL.productID
order by productCost desc

Inner join returns too many rows

I a having an issue joining two tables in order to return just one column from the other.
SELECT om.*, cm.Sales_Stage
FROM dbo.OM_Table1 om
JOIN dbo.Criteria_Matters cm ON cm.clientCorporationID = om.ClientCorporationID
ORDER BY om.ClientCorporationID
I want to include the Sales_Stage from my CM table but the join causes the result set to return 14k+ rows instead of the ~7k that is returned without the join.
Is there anyway to just bring in this additional column without blowing up the query?

You can use a subquery... note that this may not correctly select the sales_stage you want as you have multiple entries in Criteria_Matters per ClientCorporationID. You probably need an order by on the subquery.
SELECT om.*,
(SELECT TOP 1 cm.Sales_Stage
FROM dbo.Criteria_Matters cm
WHERE cm.clientCorporationID = om.ClientCorporationID) AS Sales_Stage
ORDER BY om.ClientCorporationID
... I'm assuming the om.* was just for the example. it is typically best practice to NOT do that in production.
If you intended to see the differences you may want to do something like this instead...
SELECT om.*, cm.Sales_Stage, cm.Criteria_MatterID
FROM dbo.OM_Table1 om
JOIN dbo.Criteria_Matters cm ON cm.clientCorporationID = om.ClientCorporationID
ORDER BY om.ClientCorporationID

Just guessing since there is no information on the structure of your tables, but the problem is likely that the Criteria_Matters table has many records for a given clientCorporationID. So it will duplicate every record in OM_Table1 based on the number of matching Criteria_Matters.ClientCorporationID records.
There are a few ways to deal with this - one way would be to use an inline view instead of joining to the full Criteria_Matters table.
If you add an inline view and GROUP BY Criteria_Matters.ClientCorporationID - you are guaranteed that there will be only one record per ClientCorporationID in the joined table - and you will not get duplicated records. Of course since you are grouping by clientCorporationID, you need to apply some aggregate function to Sales_Stage. If you just pick MAX(Sales_Stage), you will get whatever the maximum value is. If you know Sales_Stage is the same for every given clientCorporationID - you are all set. The SQL will look something like the following:
SELECT om.*, cm.Sales_Stage
FROM dbo.OM_Table1 om
INNER JOIN
(
SELECT clientCorporationID, MAX(Sales_Stage) AS Sales_Stage
FROM dbo.Criteria_Matters
GROUP BY clientCorporationID
) cm ON cm.clientCorporationID = om.ClientCorporationID
ORDER BY om.ClientCorporationID
However, if there are different Sales_Stage values for a given clientCorporationID in the Criteria_Matters table - you can group by both clientCorporationID and Sales_Stage. When you do this, you will now get OM_Table1 records duplicated - but only for each unique Sales_Stage corresponding to a ClientCorporationID in Criteria_Matters. The SQL would look like this:
SELECT om.*, cm.Sales_Stage
FROM dbo.OM_Table1 om
INNER JOIN
(
SELECT clientCorporationID, Sales_Stage
FROM dbo.Criteria_Matters
GROUP BY clientCorporationID, Sales_Stage
) cm ON cm.clientCorporationID = om.ClientCorporationID
ORDER BY om.ClientCorporationID
Good luck!

How to select all attributes in sql Join query

The following sql query below produces the specified result.
select product.product_no,product_type,salesteam.rep_name,salesteam.SUPERVISOR_NAME
from product
inner join salesteam
on product.product_rep=salesteam.rep_id
ORDER BY product.Product_No;
However my intensions are to further produce a more detailed result which will include all the attributes in the PRODUCT table. my approach is to list all the attributes in the first line of the query.
select product.product_no,product.product_date,product.product_colour,product.product_style,
product.product_age product_type,salesteam.rep_name,salesteam.SUPERVISOR_NAME
from product
inner join salesteam
on product.product_rep=salesteam.rep_id
ORDER BY product.Product_No;
Is there another way it can be done instead of listing all the attributes of PRoduct table one by one?

You can use * to select all columns from all tables, or you can use [table/alias].* to select all columns from the specified table. In your case, you can use product.*:
select product.*,salesteam.rep_name,salesteam.SUPERVISOR_NAME
from product
inner join salesteam
on product.product_rep=salesteam.rep_id
ORDER BY product.Product_No;
It is important to note that you should only do this if you are 100% sure you need every single column, and always will. There are performance implications associated with this; if you're selecting 100 columns from a table when you really only need 4 or 5 of them, you're adding a lot of overhead to the query. The DBMS has to work harder, and you're also sending more data across the wire (if your database is not on the same machine as your executing code).
If any columns are later added to the product table, those columns will also be returned by this query in the future.

select
product.*,
salesteam.rep_name,
salesteam.SUPERVISOR_NAME
from product inner join salesteam on
product.product_rep=salesteam.rep_id
ORDER BY
product.Product_No;
This should do.

You can write like this
select P.* --- all Product columns
,S.* --- all salesteam columns
from product P
inner join salesteam S
on P.product_rep=S.rep_id
ORDER BY P.Product_No;

How can I exclude values from a third query (Access)

I have a query that shows me a listing of ALL opportunities in one query
I have a query that shows me a listing of EXCLUSION opportunities, ones we want to eliminate from the results
I need to produce a query that will take everything from the first query minus the second query...
SELECT DISTINCT qryMissedOpportunity_ALL_Clients.*
FROM qryMissedOpportunity_ALL_Clients INNER JOIN qryMissedOpportunity_Exclusions ON
([qryMissedOpportunity_ALL_Clients].[ClientID] <> [qryMissedOpportunity_Exclusions].[ClientID])
AND
([qryMissedOpportunity_Exclusions].[ClientID] <> [qryMissedOpportunity_Exclusions].[BillingCode])
The initial query works as intended and exclusions successfully lists all the hits, but I get the full listing when I query with the above which is obviously wrong. Any tips would be appreciated.
EDIT - Two originating queries
qryMissedOpportunity_ALL_Clients (1)
SELECT MissedOpportunities.MOID, PriceList.BillingCode, Client.ClientID, Client.ClientName, PriceList.WorkDescription, PriceList.UnitOfWork, MissedOpportunities.Qty, PriceList.CostPerUnit AS Our_PriceList_Cost, ([MissedOpportunities].[Qty]*[PriceList].[CostPerUnit]) AS At_Cost, MissedOpportunities.fBegin
FROM PriceList INNER JOIN (Client INNER JOIN MissedOpportunities ON Client.ClientID = MissedOpportunities.ClientID) ON PriceList.BillingCode = MissedOpportunities.BillingCode
WHERE (((MissedOpportunities.fBegin)=#10/1/2009#));
qryMissedOpportunity_Exclusions
SELECT qryMissedOpportunity_ALL_Clients.*, MissedOpportunity_Exclusions.Exclusion, MissedOpportunity_Exclusions.Comments
FROM qryMissedOpportunity_ALL_Clients INNER JOIN MissedOpportunity_Exclusions ON (qryMissedOpportunity_ALL_Clients.BillingCode = MissedOpportunity_Exclusions.BillingCode) AND (qryMissedOpportunity_ALL_Clients.ClientID = MissedOpportunity_Exclusions.ClientID)
WHERE (((MissedOpportunity_Exclusions.Exclusion)=True));
One group needs to see everything, the other needs to see things they havn't deamed as "valid" missed opportunity as in, we've seen it, verified why its there and don't need to bother critiquing it every single month.

Generally you can exclude a table by doing a left join and comparing against null:
SELECT t1.* FROM t1 LEFT JOIN t2 on t1.id = t2.id where t2.id is null;
Should be pretty easy to adopt this to your situation.

Looking at your query rewritten to use table aliases so I can read it...
SELECT DISTINCT c.*
FROM qryMissedOpportunity_ALL_Clients c
JOIN qryMissedOpportunity_Exclusions e
ON c.ClientID <> e.ClientID
AND e.ClientID <> e.BillingCode
This query will produce a cartesian product of sorts... each and every row in qryMissedOpportunity_ALL_Clients will match and join with every row in qryMissedOpportunity_Exclusions where ClientIDs do not match... Is this what you want?? Generally join conditions are based on a column in one table being equal to the value of a column in the other table... Joining where they are not equal is unusual ...
Second, the second iniquality in the join conditions is between columns in the same table (qryMissedOpportunity_Exclusions table) Are you sure this is what you want? If it is, it is not a join condition, it is a Where clause condition...
Second, your question mentions two queries, but there is only the one query (above) in yr question. Where is the second one?

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Microsoft SQL Server: Need to Compare Two Sets of Results - sql

Related

Query with Left outer join and group by returning duplicates

How to improve code to include columns from another table?

Inner join returns too many rows

How to select all attributes in sql Join query

How can I exclude values from a third query (Access)

Categories

Resources