Evaluate Multiple conditions for same row - sql

I have to compare 2 different sources and identify all the mismatches for all IDs
Source_excel table
+-----+-------------+------+----------+
| id | name | City | flag |
+-----+-------------+------+----------+
| 101 | Plate | NY | Ready |
| 102 | Back washer | NY | Sold |
| 103 | Ring | MC | Planning |
| 104 | Glass | NMC | Ready |
| 107 | Cover | PR | Ready |
+-----+-------------+------+----------+
Source_dw table
+-----+----------+------+----------+
| id | name | City | flag |
+-----+----------+------+----------+
| 101 | Plate | NY | Planning |
| 102 | Nut | TN | Expired |
| 103 | Ring | MC | Planning |
| 104 | Top Wire | NY | Ready |
| 105 | Bolt | MC | Expired |
+-----+----------+------+----------+
Expected result
+-----+-------------+----------+------------+----------+------------+---------+------------------+
| ID | excel_name | dw_name | excel_flag | dw_flag | excel_city | dw_city | RESULT |
+-----+-------------+----------+------------+----------+------------+---------+------------------+
| 101 | Plate | Plate | Ready | Planning | NY | NY | FLAG_MISMATCH |
| 102 | Back washer | Nut | Sold | Expired | NY | TN | NAME_MISMATCH |
| 102 | Back washer | Nut | Sold | Expired | NY | TN | FLAG_MISMATCH |
| 102 | Back washer | Nut | Sold | Expired | NY | TN | CITY_MISMATCH |
| 103 | Ring | Ring | Planning | Planning | MC | MC | ALL_MATCH |
| 104 | Glass | Top Wire | Ready | Ready | NMC | NY | NAME_MISMATCH |
| 104 | Glass | Top Wire | Ready | Ready | NMC | NY | CITY_MISMATCH |
| 107 | Cover | | Ready | | PR | | MISSING IN DW |
| 105 | | Bolt | | Expired | | MC | MISSING IN EXCEL |
+-----+-------------+----------+------------+----------+------------+---------+------------------+
I have tried the below query but it is giving only one mismatch.
select ISNULL(EXCEL.ID,DW.ID) ID,
excel.name as excel_name,dw.name as dw_name,
excel.flag as excel_flag,dw.flag as dw_flag,
excel.city as excel_city,dw.city as dw_city,
RESULT = CASE WHEN excel.ID IS NULL THEN 'MISSING IN EXCEL'
WHEN dw.ID IS NULL THEN 'MISSING IN DW'
WHEN excel.NAME<>dw.NAME THEN 'NAME_MISMATCH'
WHEN excel.CITY<>dw.CITY THEN 'CITY_MISMATCH'
WHEN excel.FLAG <> dw.FLAG THEN 'FLAG_MISMATCH'
ELSE 'ALL_MATCH' END
from source_excel excel
FULL OUTER JOIN source_dw dw ON excel.id=dw.id
Actual output
+-----+-------------+----------+------------+----------+------------+---------+------------------+
| ID | excel_name | dw_name | excel_flag | dw_flag | excel_city | dw_city | RESULT |
+-----+-------------+----------+------------+----------+------------+---------+------------------+
| 101 | Plate | Plate | Ready | Planning | NY | NY | FLAG_MISMATCH |
| 102 | Back washer | Nut | Sold | Expired | NY | TN | NAME_MISMATCH |
| 103 | Ring | Ring | Planning | Planning | MC | MC | ALL_MATCH |
| 104 | Glass | Top Wire | Ready | Ready | NMC | NY | NAME_MISMATCH |
| 107 | Cover | | Ready | | PR | | MISSING IN DW |
| 105 | | Bolt | | Expired | | MC | MISSING IN EXCEL |
+-----+-------------+----------+------------+----------+------------+---------+------------------+
I understand that case expression will only check the first satisfying condition. Is there any other way I can check all the condition?

If I follow you correctly, you want one row per mismatch, or one row indicating that everything matches.
You can use cross apply to generate the rows, like so:
SELECT
COALESCE(xl.ID, dw.ID) ID,
xl.name as excel_name,dw.name as dw_name,
xl.flag as excel_flag,dw.flag as dw_flag,
xl.city as excel_city,dw.city as dw_city,
x.result
FROM source_excel xl
FULL OUTER JOIN source_dw dw ON xl.id = dw.id
CROSS APPLY (VALUES
(CASE WHEN xl.ID IS NULL THEN 'MISSING IN EXCEL' END),
(CASE WHEN dw.ID IS NULL THEN 'MISSING IN DW' END),
(CASE WHEN WHEN xl.NAME <> dw.NAME THEN 'NAME_MISMATCH' END),
(CASE WHEN xl.CITY <> dw.CITY THEN 'CITY_MISMATCH' END),
(CASE WHEN xl.FLAG <> dw.FLAG THEN 'FLAG_MISMATCH' END),
(CASE WHEN
xl.ID = dw.ID
AND xl.NAME = dw.NAME
AND xl.CITY = dw.CITY
AND xl.FLAG = dw.FLAG
THEN 'ALL_MATCH' END)
) x(result)
WHERE x.result IS NOT NULL

I would concat the mismatches into a single row, concatenating the reasons together:
select COALESCE(EXCEL.ID, DW.ID) as ID,
excel.name as excel_name,dw.name as dw_name,
excel.flag as excel_flag,dw.flag as dw_flag,
excel.city as excel_city,dw.city as dw_city,
(CASE WHEN excel.ID IS NULL
THEN 'MISSING IN EXCEL'
WHEN dw.ID IS NULL
THEN 'MISSING IN DW'
WHEN excel.NAME = dw.NAME AND excel.CITY = dw.CITY AND excel.FLAG = dw.FLAG
THEN 'ALL MATCH'
ELSE CONCAT(CASE WHEN excel.NAME <> dw.NAME THEN 'NAME_MISMATCH; ' END,
CASE WHEN excel.CITY <> dw.CITY THEN 'CITY_MISMATCH; ' END,
CASE WHEN excel.FLAG <> dw.FLAG THEN 'FLAG_MISMATCH;' END
)
END)
from source_excel excel FULL OUTER JOIN
source_dw dw
ON excel.id = dw.id;

Related

Compare data between 2 different source

I have a two datasets coming from 2 sources and i have to compare and find the mismatches. One from excel and other from Datawarehouse.
From excel Source_Excel
+-----+-------+------------+----------+
| id | name | City_Scope | flag |
+-----+-------+------------+----------+
| 101 | Plate | NY|TN | Ready |
| 102 | Nut | NY|TN | Sold |
| 103 | Ring | TN|MC | Planning |
| 104 | Glass | NY|TN|MC | Ready |
| 105 | Bolt | MC | Expired |
+-----+-------+------------+----------+
From DW Source_DW
+-----+-------+------+----------+
| id | name | City | flag |
+-----+-------+------+----------+
| 101 | Plate | NY | Ready |
| 101 | Plate | TN | Ready |
| 102 | Nut | TN | Expired |
| 103 | Ring | MC | Planning |
| 104 | Glass | MC | Ready |
| 104 | Glass | NY | Ready |
| 105 | Bolt | MC | Expired |
+-----+-------+------+----------+
Unfortunately Data from excel comes with separator for one column. So i have to use DelimitedSplit8K function to split that into individual rows. so i got the below output after splitting the excel source data.
+-----+-------+------+----------+
| id | name | item | flag |
+-----+-------+------+----------+
| 101 | Plate | NY | Ready |
| 101 | Plate | TN | Ready |
| 102 | Nut | NY | Sold |
| 102 | Nut | TN | Sold |
| 103 | Ring | TN | Planning |
| 103 | Ring | MC | Planning |
| 104 | Glass | NY | Ready |
| 104 | Glass | TN | Ready |
| 104 | Glass | MC | Ready |
| 105 | Bolt | MC | Expired |
+-----+-------+------+----------+
Now my expected output is something like this.
+-----+----------+---------------+--------------+
| ID | Result | Flag_mismatch | City_Missing |
+-----+----------+---------------+--------------+
| 101 | No_Error | | |
| 102 | Error | Yes | Yes |
| 103 | Error | No | Yes |
| 104 | Error | Yes | No |
| 105 | No_Error | | |
+-----+----------+---------------+--------------+
Logic:
I have to find if there are any mismatches in flag values.
After splitting if there are any city missing, then that should be reported.
Assume that there wont be any Name and city mismatches.
As a intial step, I'm trying to get the Mismatch rows and I have tried below query. It is not giving me any output. Please suggest where am going wrong.Check Fiddle Here
select a.id,a.name,split.item,a.flag from source_excel a
CROSS APPLY dbo.DelimitedSplit8k(a.city_scope,'|') split
where not exists (
select a.id,split.item
from source_excel a
join source_dw b
on a.id=b.id and a.name=b.name and a.flag=b.flag and split.item=b.city
)
Update
I have tried and got close to the answers with the help of temporary tables. Updated Fiddle . But not sure how to do without temp tables

SQL count duplicates in another column based on one field per row

I am building out a customer retention report. We identify customers by their email. Here is some sample data from our table:
+----------------------------+------------------+-------------------+---------------------+------------+-------------+--------------+---------------+------------------+---------------+----------------+--------------+------------------+--+--+--+--+--+
| Email | BrandNewCustomer | RecurringCustomer | ReactivatedCustomer | OrderCount | TotalOrders | Date_Created | Customer_Name | Customer_Address | Customer_City | Customer_State | Customer_Zip | Customer_Country | | | | | |
+----------------------------+------------------+-------------------+---------------------+------------+-------------+--------------+---------------+------------------+---------------+----------------+--------------+------------------+--+--+--+--+--+
| zyw#marketplace.amazon.com | 1 | 0 | 0 | 1 | 1 | 41:50.0 | Sha | 990 | BRO | NY | 112 | US | | | | | |
| zyu#gmail.com | 1 | 0 | 0 | 1 | 1 | 57:25.0 | Zyu | 181 | Mia | FL | 330 | US | | | | | |
| ZyR#aol.com | 1 | 0 | 0 | 1 | 1 | 10:19.0 | Day | 581 | Myr | SC | 295 | US | | | | | |
| zyr#gmail.com | 1 | 0 | 0 | 1 | 1 | 25:19.0 | Nic | 173 | Was | DC | 200 | US | | | | | |
| zy#gmail.com | 1 | 0 | 0 | 1 | 1 | 19:18.0 | Kim | 675 | MIA | FL | 331 | US | | | | | |
| zyou#gmail.com | 1 | 0 | 0 | 1 | 1 | 40:29.0 | zoe | 160 | Mob | AL | 366 | US | | | | | |
| zyon#yahoo.com | 1 | 0 | 0 | 1 | 1 | 17:21.0 | Zyo | 84 | Sta | CT | 690 | US | | | | | |
| zyo#gmail.com | 1 | 0 | 0 | 2 | 2 | 02:03.0 | Zyo | 432 | Ell | GA | 302 | US | | | | | |
| zyo#gmail.com | 1 | 0 | 0 | 1 | 2 | 12:54.0 | Zyo | 432 | Ell | GA | 302 | US | | | | | |
| zyn#icloud.com | 1 | 0 | 0 | 1 | 1 | 54:56.0 | Zyn | 916 | Nor | CA | 913 | US | | | | | |
| zyl#gmail.com | 0 | 1 | 0 | 3 | 3 | 31:27.0 | Ser | 123 | Mia | FL | 331 | US | | | | | |
| zyk#marketplace.amazon.com | 1 | 0 | 0 | 1 | 1 | 44:00.0 | Myr | 101 | MIA | FL | 331 | US | | | | | |
+----------------------------+------------------+-------------------+---------------------+------------+-------------+--------------+---------------+------------------+---------------+----------------+--------------+------------------+--+--+--+--+--+
We define our customer by email. So all orders with the same email are marked to be under one customer and then we do calculations on top of that.
Now I am trying to find out about customers whose emails have changed. So to do this we will try to line up customers by their address.
So per each row (so when separated by email), I want to have another column called something like Orders_With_Same_Address_Different_Email. How would I do that?
I have tried doing something with Dense Rank but it doesn't seem to work:
SELECT DISTINCT
Email
,BrandNewCustomer
,RecurringCustomer
,ReactivatedCustomer
,OrderCount
,TotalOrders
,Date_Created
,Customer_Name
,Customer_Address
,Customer_City
,Customer_State
,Customer_Zip
,Customer_Country
,(DENSE_RANK() over (partition by Email order by (case when email <> email then Customer_Address end) asc)
+DENSE_RANK() over ( partition by Email order by (case when email <> email then Customer_Address end) desc)
- 1) as Orders_With_Same_Name_Different_Email
--*
FROM Customers
Try counting the email partitioned by address, not by email:
select Email,
-- ...
Orders_With_Same_Name_Different_Email = iif(
(count(email) over (partition by Customer_Address) > 1,
1, 0)
from Customers;
But this is a lesson in why you wouldn't use an email as an identifier for a client. Address is a bad idea as well. Use something that won't change. That usually means making an internal identifier, such as something that auto-increments:
alter table #customers
add customerId int identity(1,1) primary key not null
Now customerId = 1 will always refer to that particular customer.
You can group by customer_address and check the count. This is by the assumption that each customer has one address.
Select * from table where
customer_address IN (
Select customer_address
From table group by customer_address
having count(distinct customer_email)
>1)
If I understand what you want to do, this is how I would solve it:
Note, you don't need the having clause in the CTE but depending on your data it could make it faster. (That is, if you have a large dataset.)
WITH email2addr
(
select email, count(distinct customer_address) as addr_cnt
from customers
group by email
having count(distinct customer_address) > 1
)
SELECT
Email
,BrandNewCustomer
,RecurringCustomer
,ReactivatedCustomer
,OrderCount
,TotalOrders
,Date_Created
,Customer_Name
,Customer_Address
,Customer_City
,Customer_State
,Customer_Zip
,Customer_Country
CASE when coalese(email2addr.addr_cnt,1) > 1 then 'Y' ELSE 'N' END as has_more_than_1_email
from customers
left join email2addr on customers.email = email2addr.email

Moving data to correct record

I have a table where the data is needs to be corrected. Below is an example of one record. Basically the data in the selling closed_unit needs to be in the Agent_to_Agent Ref close_unit. I have tried every different what I can think of but I can't get it figured out. I am sure it is fairly simple I think I am just looking too hard at the wrong way. Any help is greatly appreciated!
Current (bad) data:
+---------+---------+--------------------+-------------+-----------------+----------------+-------------------+----------+
| sale_no | payeeID | ComType | close_units | record_type | ref_agent_type | referring_agentID | ref_side |
+---------+---------+--------------------+-------------+-----------------+----------------+-------------------+----------+
| 7586 | 1001 | Listing | 1 | Listing | NULL | 0 | |
| 7586 | 2001 | Selling | 1 | Selling | NULL | 0 | |
| 7586 | 3254 | NULL | 0 | Off The Top Ref | NULL | 0 | L |
| 7586 | 4684 | Agent to Agent Ref | 0 | Agent Paid Ref | Selling | 2001 | |
+---------+---------+--------------------+-------------+-----------------+----------------+-------------------+----------+
Expected result:
+---------+---------+--------------------+-------------+-----------------+----------------+-------------------+----------+
| sale_no | payeeID | ComType | close_units | record_type | ref_agent_type | referring_agentID | ref_side |
+---------+---------+--------------------+-------------+-----------------+----------------+-------------------+----------+
| 7586 | 1001 | Listing | 1 | Listing | NULL | 0 | |
| 7586 | 2001 | Selling | 0 | Selling | NULL | 0 | |
| 7586 | 3254 | NULL | 0 | Off The Top Ref | NULL | 0 | L |
| 7586 | 4684 | Agent to Agent Ref | 1 | Agent Paid Ref | Selling | 2001 | |
+---------+---------+--------------------+-------------+-----------------+----------------+-------------------+----------+
The following query will copy the value to the "Agent to Agent Ref" row:
update my_table t1 set close_units = (
select close_units from my_table t2
where t2.sale_no = t1.sale_no and t2.ComType = 'Selling'
)
where ComType = 'Agent to Agent Ref';
And this one will reset the "Selling" value to zero:
update my_table t1
set close_units = 0
where ComType = 'Selling'
and exists (
select close_units from my_table t2
where t2.sale_no = t1.sale_no and t2.ComType = 'Agent to Agent Ref'
)

Sum data from two tables with different number of rows

There are 3 Tables (SorMaster, SorDetail, and InvWarehouse):
SorMaster:
+------------+
| SalesOrder |
+------------+
| 100 |
| 101 |
| 102 |
+------------+
SorDetail:
+------------+------------+---------------+
| SalesOrder | MStockCode | MBackOrderQty |
+------------+------------+---------------+
| 100 | PN-1 | 4 |
| 100 | PN-2 | 9 |
| 100 | PN-3 | 1 |
| 100 | PN-4 | 6 |
| 101 | PN-1 | 6 |
| 101 | PN-3 | 2 |
| 102 | PN-2 | 19 |
| 102 | PN-3 | 14 |
| 102 | PN-4 | 6 |
| 102 | PN-5 | 4 |
+------------+------------+---------------+
InvWarehouse:
+------------+-----------+-----------+
| MStockCode | Warehouse | QtyOnHand |
+------------+-----------+-----------+
| PN-1 | A | 1 |
| PN-2 | B | 9 |
| PN-3 | A | 0 |
| PN-4 | B | 1 |
| PN-1 | A | 0 |
| PN-3 | B | 5 |
| PN-2 | A | 9 |
| PN-3 | B | 4 |
| PN-4 | A | 6 |
| PN-5 | B | 0 |
+------------+-----------+-----------+
Desired Results:
+------------+-----------------+--------------+
| MStockCode | SumBackOrderQty | SumQtyOnHand |
+------------+-----------------+--------------+
| PN-1 | 10 | 10 |
| PN-2 | 28 | 1 |
| PN-3 | 17 | 5 |
| PN-4 | 12 | 13 |
| PN-5 | 11 | 6 |
+------------+-----------------+--------------+
I have been going around in circles with no end in sight. Seems like it should be simple but just can't wrap my head around it. The SumBackOrderQty obviously getting counted twice as the SumQtyOnHand is evaluated. To this point I have been doing the calculations in the PHP instead of the select statement but would like to clean things up a bit where possible.
Current query statement is:
SELECT SorDetail.MStockCode,
SUM(SorDetail.MBackOrderQty) AS 'SumMBackOrderQty',
SUM(InvWarehouse.QtyOnHand) AS 'SumQtyOnHand'
FROM SysproCompanyJ.dbo.SorMaster SorMaster,
SysproCompanyJ.dbo.SorDetail SorDetail LEFT OUTER JOIN SysproCompanyJ.dbo.InvWarehouse InvWarehouse
ON SorDetail.MStockCode = InvWarehouse.StockCode
WHERE SorMaster.SalesOrder = SorDetail.SalesOrder
AND SorMaster.ActiveFlag != 'N'
AND SorDetail.MBackOrderQty > '0'
AND SorDetail.MPrice > '0'
GROUP BY SorDetail.MStockCode
ORDER BY SorDetail.MStockCode ASC
Without providing the complete picture, in terms of your RDBMS, database schema, a description of the problem you're trying to solve and sample data that matches the aforementioned, the following is just an illustration of what a solution based on Barmar's comment could look like:
SELECT SD.MStockCode,
SD.SumBackOrderQty,
IW.SumQtyOnHand
FROM (SELECT MStockCode,
SUM(MBackOrderQty) AS `SumBackOrderQty`
FROM SorDetail
JOIN SorMaster ON SorDetail.SalesOrder=SorMaster.SalesOrder
WHERE SorMaster.ActiveFlag != 'N'
AND SorDetail.MBackOrderQty > 0
AND SorDetail.MPrice > 0
GROUP BY MStockCode) AS SD
LEFT JOIN (SELECT MStockCode,
SUM(QtyOnHand) AS `SumQtyOnHand`
FROM InvWarehouse
GROUP BY MStockCode) AS IW ON SD.MStockCode=IW.MStockCode
ORDER BY SD.MStockCode;
Here's one approach:
select MStockCode,
(select sum(MBackOrderQty) from sorDetail as T2
where T2.MStockCode = T1.MStockCode ) as SumBackOrderQty,
(select sum(QtyOnHand) from invWarehouse as T3
where T3.MStockCode = T1.MStockCode ) as SumQtyOnHand
from
(
select mstockcode from sorDetail
union
select mstockcode from invWarehouse
) as T1
In a fiddle here: http://sqlfiddle.com/#!9/fdaca/6
Though my SumQtyOnHand values don't match yours (as #Gordon pointed out).

Union of two tables in a 3rd table with additional flag column using SQL

I have 2 tables:
One is Promotion
| PromoId |Promo Decription|
----------------------
| 101 | abc|
| 102 | pqr|
| 103 | alp|
| 104 | adc|
| 201 | abc|
and the other is PromotionType
| PromoId | PromoType |
----------------------
| 101 | 1 |
| 121 | 2 |
| 188 | 3 |
| 104 | 4 |
| 191 | 4 |
| 102 | 4 |
I want a resultant table
| PromoId | Flag |Promo Decription |PromoType |
----------------------
| 101 | 1 | | 1 |
| 121 | 0 | | 2 |
| 188 | 0 | | 3 |
| 104 | 1 | adc | 4 |
| 191 | 0 | | 4 |
| 102 | 1 | pqr | 4 |
| 103 | 1 | alp | |
| 201 | 0 | abc | |
i.e I want a resultant table , which is the union of two tables .It should not contain duplicate values and the value of flag is set to true for all the values of PromoId's which are common to both tables.
I am using Sql Server as our database.
You can use a FULL OUTER JOIN to perform this:
select
coalesce(p.promoid, t.promoid) promoid,
case when p.promoid = t.promoid then 1 else 0 end flag
from promotion p
full outer join promotiontype t
on p.promoid = t.promoid
order by promoid
See SQL Fiddle with Demo
Result:
| PROMOID | FLAG |
------------------
| 101 | 1 |
| 102 | 1 |
| 103 | 0 |
| 104 | 1 |
| 121 | 0 |
| 188 | 0 |
| 191 | 0 |
| 201 | 0 |
Edit, even with your changes to the data sample the query will still produce the result:
select
coalesce(p.promoid, t.promoid) promoid,
case when p.promoid = t.promoid then 1 else 0 end flag,
isnull(p.[Promo Decription], '') [Promo Decription],
isnull(t.PromoType, null) PromoType
from promotion p
full outer join promotiontype t
on p.promoid = t.promoid
order by
case when PromoType is not null then 0 else 1 end, promotype, promoid
See SQL Fiddle with Demo
Result is:
| PROMOID | FLAG | PROMO DECRIPTION | PROMOTYPE |
-------------------------------------------------
| 101 | 1 | abc | 1 |
| 121 | 0 | | 2 |
| 188 | 0 | | 3 |
| 102 | 1 | pqr | 4 |
| 104 | 1 | adc | 4 |
| 191 | 0 | | 4 |
| 103 | 0 | alp | (null) |
| 201 | 0 | abc | (null) |
You can use the following script:
select a.PromoID,
coalesce((case when b.promoID=a.promoID then '1'
when b.promoID<>a.promoID then '0'
end),'0') flag
from hr.promotion_type a
LEFT OUTER join hr.promotion b
on(a.promoID= b.promoid)
here, the HR is the schema I used, you can use your corresponding schema