Compare data between 2 different source - sql

I have a two datasets coming from 2 sources and i have to compare and find the mismatches. One from excel and other from Datawarehouse.
From excel Source_Excel
+-----+-------+------------+----------+
| id | name | City_Scope | flag |
+-----+-------+------------+----------+
| 101 | Plate | NY|TN | Ready |
| 102 | Nut | NY|TN | Sold |
| 103 | Ring | TN|MC | Planning |
| 104 | Glass | NY|TN|MC | Ready |
| 105 | Bolt | MC | Expired |
+-----+-------+------------+----------+
From DW Source_DW
+-----+-------+------+----------+
| id | name | City | flag |
+-----+-------+------+----------+
| 101 | Plate | NY | Ready |
| 101 | Plate | TN | Ready |
| 102 | Nut | TN | Expired |
| 103 | Ring | MC | Planning |
| 104 | Glass | MC | Ready |
| 104 | Glass | NY | Ready |
| 105 | Bolt | MC | Expired |
+-----+-------+------+----------+
Unfortunately Data from excel comes with separator for one column. So i have to use DelimitedSplit8K function to split that into individual rows. so i got the below output after splitting the excel source data.
+-----+-------+------+----------+
| id | name | item | flag |
+-----+-------+------+----------+
| 101 | Plate | NY | Ready |
| 101 | Plate | TN | Ready |
| 102 | Nut | NY | Sold |
| 102 | Nut | TN | Sold |
| 103 | Ring | TN | Planning |
| 103 | Ring | MC | Planning |
| 104 | Glass | NY | Ready |
| 104 | Glass | TN | Ready |
| 104 | Glass | MC | Ready |
| 105 | Bolt | MC | Expired |
+-----+-------+------+----------+
Now my expected output is something like this.
+-----+----------+---------------+--------------+
| ID | Result | Flag_mismatch | City_Missing |
+-----+----------+---------------+--------------+
| 101 | No_Error | | |
| 102 | Error | Yes | Yes |
| 103 | Error | No | Yes |
| 104 | Error | Yes | No |
| 105 | No_Error | | |
+-----+----------+---------------+--------------+
Logic:
I have to find if there are any mismatches in flag values.
After splitting if there are any city missing, then that should be reported.
Assume that there wont be any Name and city mismatches.
As a intial step, I'm trying to get the Mismatch rows and I have tried below query. It is not giving me any output. Please suggest where am going wrong.Check Fiddle Here
select a.id,a.name,split.item,a.flag from source_excel a
CROSS APPLY dbo.DelimitedSplit8k(a.city_scope,'|') split
where not exists (
select a.id,split.item
from source_excel a
join source_dw b
on a.id=b.id and a.name=b.name and a.flag=b.flag and split.item=b.city
)
Update
I have tried and got close to the answers with the help of temporary tables. Updated Fiddle . But not sure how to do without temp tables

Related

Two foreign keys referring to one primary key on SQL SERVER query select items

I'd to create a database that records transactions between two users. A user can transfer points (think of it as money) to another user. user table looks like:
| userID | name | email | balance |
| ------------- |---------------|------------------|------------|
| 101 | alpha | alpha#mail.com | 1000 |
| 102 | bravo | bravo#mail.com | 500 |
| 103 | charlie | charlie#mail.com | 2000 |
And the transaction table should look like:
transactionID | from_user | to_user | transfer_amount |
| ------------- |---------------|------------------|------------------|
| 1 | 101 | 103 | 100 |
| 2 | 102 | 101 | 150 |
| 3 | 102 | 103 | 200 |
just i needed this result:
| row | from_user | to_user | transfer_amount |
| ------------- |---------------|------------------|------------------|
| 1 | alpha | charlie | 100 |
| 2 | bravo | alpha | 150 |
| 3 | bravo | charlie | 200 |
Could someone give hints to provide SQL Server code?
Select from_user, to_user, name, transfer_amount from transaction iner join users on trans.id==user.id;
SELECT T.TRANSACTION_ID,T.FROM_USER,U_FROM.NAME,
T.TO_USER,U_TO.NAME,T.TRANSFER_AMOUNT
FROM TRANSACTIONS AS T
JOIN USERS AS U_FROM ON T.FROM_USER=U_FROM.USER_ID
JOIN USERS AS U_TO ON T.TO_USER=U_TO.USER_ID
Something like this, I guess

Evaluate Multiple conditions for same row

I have to compare 2 different sources and identify all the mismatches for all IDs
Source_excel table
+-----+-------------+------+----------+
| id | name | City | flag |
+-----+-------------+------+----------+
| 101 | Plate | NY | Ready |
| 102 | Back washer | NY | Sold |
| 103 | Ring | MC | Planning |
| 104 | Glass | NMC | Ready |
| 107 | Cover | PR | Ready |
+-----+-------------+------+----------+
Source_dw table
+-----+----------+------+----------+
| id | name | City | flag |
+-----+----------+------+----------+
| 101 | Plate | NY | Planning |
| 102 | Nut | TN | Expired |
| 103 | Ring | MC | Planning |
| 104 | Top Wire | NY | Ready |
| 105 | Bolt | MC | Expired |
+-----+----------+------+----------+
Expected result
+-----+-------------+----------+------------+----------+------------+---------+------------------+
| ID | excel_name | dw_name | excel_flag | dw_flag | excel_city | dw_city | RESULT |
+-----+-------------+----------+------------+----------+------------+---------+------------------+
| 101 | Plate | Plate | Ready | Planning | NY | NY | FLAG_MISMATCH |
| 102 | Back washer | Nut | Sold | Expired | NY | TN | NAME_MISMATCH |
| 102 | Back washer | Nut | Sold | Expired | NY | TN | FLAG_MISMATCH |
| 102 | Back washer | Nut | Sold | Expired | NY | TN | CITY_MISMATCH |
| 103 | Ring | Ring | Planning | Planning | MC | MC | ALL_MATCH |
| 104 | Glass | Top Wire | Ready | Ready | NMC | NY | NAME_MISMATCH |
| 104 | Glass | Top Wire | Ready | Ready | NMC | NY | CITY_MISMATCH |
| 107 | Cover | | Ready | | PR | | MISSING IN DW |
| 105 | | Bolt | | Expired | | MC | MISSING IN EXCEL |
+-----+-------------+----------+------------+----------+------------+---------+------------------+
I have tried the below query but it is giving only one mismatch.
select ISNULL(EXCEL.ID,DW.ID) ID,
excel.name as excel_name,dw.name as dw_name,
excel.flag as excel_flag,dw.flag as dw_flag,
excel.city as excel_city,dw.city as dw_city,
RESULT = CASE WHEN excel.ID IS NULL THEN 'MISSING IN EXCEL'
WHEN dw.ID IS NULL THEN 'MISSING IN DW'
WHEN excel.NAME<>dw.NAME THEN 'NAME_MISMATCH'
WHEN excel.CITY<>dw.CITY THEN 'CITY_MISMATCH'
WHEN excel.FLAG <> dw.FLAG THEN 'FLAG_MISMATCH'
ELSE 'ALL_MATCH' END
from source_excel excel
FULL OUTER JOIN source_dw dw ON excel.id=dw.id
Actual output
+-----+-------------+----------+------------+----------+------------+---------+------------------+
| ID | excel_name | dw_name | excel_flag | dw_flag | excel_city | dw_city | RESULT |
+-----+-------------+----------+------------+----------+------------+---------+------------------+
| 101 | Plate | Plate | Ready | Planning | NY | NY | FLAG_MISMATCH |
| 102 | Back washer | Nut | Sold | Expired | NY | TN | NAME_MISMATCH |
| 103 | Ring | Ring | Planning | Planning | MC | MC | ALL_MATCH |
| 104 | Glass | Top Wire | Ready | Ready | NMC | NY | NAME_MISMATCH |
| 107 | Cover | | Ready | | PR | | MISSING IN DW |
| 105 | | Bolt | | Expired | | MC | MISSING IN EXCEL |
+-----+-------------+----------+------------+----------+------------+---------+------------------+
I understand that case expression will only check the first satisfying condition. Is there any other way I can check all the condition?
If I follow you correctly, you want one row per mismatch, or one row indicating that everything matches.
You can use cross apply to generate the rows, like so:
SELECT
COALESCE(xl.ID, dw.ID) ID,
xl.name as excel_name,dw.name as dw_name,
xl.flag as excel_flag,dw.flag as dw_flag,
xl.city as excel_city,dw.city as dw_city,
x.result
FROM source_excel xl
FULL OUTER JOIN source_dw dw ON xl.id = dw.id
CROSS APPLY (VALUES
(CASE WHEN xl.ID IS NULL THEN 'MISSING IN EXCEL' END),
(CASE WHEN dw.ID IS NULL THEN 'MISSING IN DW' END),
(CASE WHEN WHEN xl.NAME <> dw.NAME THEN 'NAME_MISMATCH' END),
(CASE WHEN xl.CITY <> dw.CITY THEN 'CITY_MISMATCH' END),
(CASE WHEN xl.FLAG <> dw.FLAG THEN 'FLAG_MISMATCH' END),
(CASE WHEN
xl.ID = dw.ID
AND xl.NAME = dw.NAME
AND xl.CITY = dw.CITY
AND xl.FLAG = dw.FLAG
THEN 'ALL_MATCH' END)
) x(result)
WHERE x.result IS NOT NULL
I would concat the mismatches into a single row, concatenating the reasons together:
select COALESCE(EXCEL.ID, DW.ID) as ID,
excel.name as excel_name,dw.name as dw_name,
excel.flag as excel_flag,dw.flag as dw_flag,
excel.city as excel_city,dw.city as dw_city,
(CASE WHEN excel.ID IS NULL
THEN 'MISSING IN EXCEL'
WHEN dw.ID IS NULL
THEN 'MISSING IN DW'
WHEN excel.NAME = dw.NAME AND excel.CITY = dw.CITY AND excel.FLAG = dw.FLAG
THEN 'ALL MATCH'
ELSE CONCAT(CASE WHEN excel.NAME <> dw.NAME THEN 'NAME_MISMATCH; ' END,
CASE WHEN excel.CITY <> dw.CITY THEN 'CITY_MISMATCH; ' END,
CASE WHEN excel.FLAG <> dw.FLAG THEN 'FLAG_MISMATCH;' END
)
END)
from source_excel excel FULL OUTER JOIN
source_dw dw
ON excel.id = dw.id;

how to update data without removing old data

I would like to develop a database of the hotel, I have created two tables; one is Room and the second one is Customer.
Can I create a relationship between them that have updating data on the room table, which means every time a new customer. But the other table is for store customers data without removing old customers? For example:
Room Table:
Room_Id | available | Customer_Id |
101 | Yes |0 |
102 | Yes |0 |
103 | No | 1236 |
104 | No | 1237 |
105 | No | 1235 |
201 | No | 1234 |
202 | No | 1233 |
After updating room table:
Room_Id | available | Customer_Id |
101 | No | 1111 |
102 | No | 2222 |
103 | Yse | 0 |
104 | Yse | 0 |
105 | Yse | 0 |
201 | Yse | 0 |
202 | Yse | 0 |
Customer Table should store both data:
Customer_Id |
1111 |
2222 |
1236 |
1237 |
1235 |
1234 |
1233 |
I want something like this because when I try to update the room table, it also updates the customer table.
I hope my question is clear.
Thanks in advance.

Joining data from two result rows on a numerical range

I am trying to create a custom interface for a system that tracks tickets.
I have got tickets in a table of the form:
+----------------------+
| Section | Row | Seat |
+----------------------+
| 15 | A | 100 |
| 15 | A | 102 |
| 15 | A | 103 |
| 15 | A | 110 |
| 15 | A | 111 |
| 15 | B | 102 |
| 15 | B | 103 |
| 15 | B | 104 |
| 15 | C | 99 |
| 15 | C | 100 |
| 15 | C | 101 |
| 15 | C | 102 |
| 15 | C | 103 |
| 15 | C | 104 |
+----------------------+
I am trying to display the ticket 'blocks' where seats behind each other are marked as such. i.e. I'd like to be able to display:
+------------------------------------------------+
| Section | Row | Seat Range | Overlaps Previous |
+------------------------------------------------+
| 15 | A | 100 - 103 | No |
| 15 | B | 102 - 104 | Yes |
| 15 | C | 99 - 104 | Yes |
| 15 | A | 110 - 111 | No |
+------------------------------------------------+
Any thoughts?
You could have an additional relation that assignes all neighbouring seats to a given one. This will then also work better than any soly numerical scheme for any sort of separation of your seats. And you could allow for a neighbourhood across rows. From there you could then iteratively define any block of free seats.
If this is about supporting a cashier, I tend to think I would not solely address that in the database but seek for an integration with the GUI to identify the blocks via some backtracking upon a click on a first free seat.

SQL Join with Group By

Ok, so i'm trying to write a complex query (at least complex to me) and need some pro help. This is my database setup:
Table: MakeList
| MakeListId | Make |
| 1 | Acura |
| 2 | Chevy |
| 3 | Pontiac |
| 4 | Scion |
| 5 | Toyota |
Table: CustomerMake
| CustomerMakeId | CustomerId | _Descriptor |
| 1 | 123 | Acura |
| 2 | 124 | Chevy |
| 3 | 125 | Pontiac |
| 4 | 126 | Scion |
| 5 | 127 | Toyota |
| 6 | 128 | Acura |
| 7 | 129 | Chevy |
| 8 | 130 | Pontiac |
| 9 | 131 | Scion |
| 10 | 132 | Toyota |
Table: Customer
| CustomerId | StatusId |
| 123 | 1 |
| 124 | 1 |
| 125 | 1 |
| 126 | 2 |
| 127 | 1 |
| 128 | 1 |
| 129 | 2 |
| 130 | 1 |
| 131 | 1 |
| 132 | 1 |
What i am trying to end up with is this...
Desired Result Set:
| Make | CustomerId|
| Acura | 123 |
| Chevy | 124 |
| Pontiac | 125 |
| Scion | 131 |
| Toyota | 127 |
I am wanting a list of unique Makes with one active (StatusId = 1) CustomerId to go with it. I'm assuming i'll have to do some GROUP BYs and JOINS but i haven't been able to figure it out. Any help would be greatly appreciated. Let me know if i haven't given enough info for my question. Thanks!
UPDATE: The script doesn't have to be performant - it will be used one time for testing purposes.
Something like this:
select cm._Descriptor,
min(cu.customerid)
from CustomerMake cm
join Customer cu on cuo.CustomerId = cm.CustomerId and cu.StatusId = 1
group by cm._Descriptor
I left out the MakeList table as it seems unnecessary because you are storing the full make name as _Descriptorin the CustomerMake table anyway (so the question is what is the MakeList table for? Why don't you store a FK to it in the CustomerMake table?)
You want to
(a) join the customer and customermake tables
(b) filter on customer.statusid
(c) group by customermake._descriptor
Depending on your RDBMS, you may need to explicitly apply a group function to customer.customerid to include it in the select list. Since you don't care which particular customerid is displayed, you could use MIN or MAX to just pick an essentially arbitrary value.