Why does my insert not detect existing rows in my target table? - sql

I'm trying to implement SCD2 by using insert and update instead of using MERGE. I need to insert a new row into my target table if the matching ids have different hash values. The table contains id, name, hash value, and 1 as enabled, which entails that the rows is the most current version.
As of the moment, I'm not getting the expected output. For example, if I have the id “1” in both the target and source table but the hash value differs, it inserts the value if I run the query again into my target table, leaving me with the id “1” with many duplicate hash values.
Query:
INSERT INTO target
SELECT s.ID, s.namn, s.hashh, 1 AS enablee
FROM source s
JOIN target t ON s.id = t.id
WHERE s.hashh <> t.hashh
Output:
1 demo 222 0
1 demo 22220
1 demo 222 1
2 demo2 666 1
2 demo2 666 1
2 demo2 888 1
Expected output:
1 demo 222 1
1 demo 22220
2 demo2 666 1
2 demo2 888 0
Ideally, I would like the insertion not to work and give me the output: (0 rows affected) if the hash value already exists in the targeted table.

To understand the behavior, consider than WHERE applies to SELECT statement, not INSERT.
You can just run
SELECT s.ID, s.namn, s.hashh, 1 AS enablee
FROM source s
JOIN target t ON s.id = t.id
WHERE s.hashh <> t.hashh
to see what is inserted. The join finds all rows with same id, and mismatched hash. If all hashes match, it produces no result. But if there are some rows with mismatched hash, you get the results even if there is a matching row too.
What you need is the opposite, join on matched hash only and check if you found a match. Something like
SELECT s.ID, s.namn, s.hashh, 1 AS enablee
FROM source s
LEFT JOIN target t ON s.id = t.id
AND s.hashh = t.hashh
WHERE t.hashh IS NULL

Related

How to insert values to table based on column value

I have a table called BranchServices and its structure as follows,
FkBranchId FkServiceId SortValue
6 1 1
7 1 1
8 1 NULL
6 2 2
7 2 2
8 2 NULL
6 3 3
Each branch have its own Sort order (SortValue).Some branch haven't sort order and that branch SortValues are NULL.When I create a new service I insert values to this table through the cross join as follows,
INSERT INTO BranchServices
SELECT b.BranchId,
s.ServiceID,
NULL
FROM #insertedServiceID s
CROSS JOIN Branch b WHERE b.IsActive = 1
But I need to change above query.While inserting newly created service id to each branch, I need to check its current sort order. If that branch consists with SortValue = NULL then I need to insert SortValue as Null null and if that branch already have Sort order need to insert 0 to sortValue how to check that branch have sorting order or not before inserting newly created service. can I check it inside the cross join? so then how?
Use a subquery to get the SortValue for the FkBranchId that you insert:
INSERT INTO BranchServices
SELECT b.BranchId,
s.ServiceID,
CASE WHEN(SELECT MIN(SortValue) FROM BranchServices WHERE FkBranchId = b.BranchId) IS NOT NULL THEN 0 END
FROM #insertedServiceID s
CROSS JOIN Branch b WHERE b.IsActive = 1
I used MIN() to make sure that the subquery will return only 1 row (MAX() would also work).

unable to use LIMIT when using correlated query

I have two tables in Postgres. I want to get the latest 3records data from table.
Below is the query:
select two.sid as sid,
two.sidname as sidname,
two.myPercent as mypercent,
two.saccur as saccur,
one.totalSid as totalSid
from table1 one,table2 two
where one.sid = two.sid;
The above query displays all records checking the condition one.sid = two.sid;I want to get only recent 3 records data(4,5,6) from table2.
I know in Postgres we can use limit to limit the rows to retrieve, but here in table2 for each ID I have multiple rows. So I guess I cannot use limit on table2 but should use on table1. Any suggestions?
table1:
sid totalSid
1 10
2 20
3 30
4 40
5 50
6 60
table2:
sid sidname myPercent saccur
1 aaaa 11 11t
1 bbb 13 13g
1 ccc 11 11g
1 qw 88 88k
//more data for 2,3,4,5....
6 xyz 89 895W
6 xyz1 90 90k
6 xyz2 91 91p
6 xyz3 92 92q
Given a changed understanding of the question a simple subquery and join should suffice.
We select everything from table1 limit to 3 records in sid order desc. This gives us the 3 most recent Sid's and then join to table2 to get the other SID relevant data. The assumption here is that SID is unique in table one and "most recent" would be those records having the highest SID.
SELECT two.sid as sid
, two.sidname as sidname
, two.myPercent as mypercent
, two.saccur as saccur
, one.totalSid as totalSid
FROM (SELECT * FROM table1 ORDER BY SID DESC LIMIT 3) one
INNER JOIN table2 two
ON one.sid = two.sid;
*note I removed a comma after one alias above.
and below we reinstated the ANSI 88 join syntax using , notation.
SELECT two.sid as sid
, two.sidname as sidname
, two.myPercent as mypercent
, two.saccur as saccur
, one.totalSid as totalSid
FROM (SELECT * FROM table1 ORDER BY SID DESC LIMIT 3) one
, table2 two
WHERE one.sid = two.sid;
This syntax basically says get the 3 most recent SIDs from table one and cross join (For each record in one match it to all records in two) that to all records in table two but then return only records that have the same SID on both sides. Modern compilers may be able to use Cost based optimization to improve performance here negating the need to do the entire cross join; however, order of operation says this is what the database would normally have to do. if one and two are both tables of substantial size, you can see the cross join could result in a very large temporary dataset

Checking for (and Deleting) Complex Object Duplicates in SQL Server

So I need to duplicate check a complex object, and then cascade delete dupes from all associated tables and I'm wondering if I can do it efficiently in SQL Server, or if I should go about it in my code. Structurally I have the following tables.
Claim
ClaimCaseSubTypes (mapping table for many to many relationship)
ClaimDiagnosticCodes (ditto)
ClaimTreatmentCodes (ditto)
Basically a Claim is only a duplicate if it is matching on 8 fields in itself AND has the same relationships in all the mapping tables.
For Example, the following records would be indicated as duplicates
Claim
Id CreateDate Other Fields
1 1/1/2015 matched
2 6/1/2015 matched
ClaimCaseSubTypes
ClaimId SubTypeId
1 34
1 64
2 34
2 64
ClaimDiagnosticCodes
ClaimId DiagnosticCodeId
1 1
2 1
ClaimTreatmentCodes
ClaimId TreatmentCodeId
1 5
1 6
2 6
2 5
And in this case I would want to keep 1 and delete 2 from the Claim table as well as any rows in the mapping tables with ClaimId of 2
This is the kind of problem that window functions are for:
;WITH cte AS (
SELECT c.ID,
ROW_NUMBER() OVER (PARTITION BY field1, field2, field3, ... ORDER BY c.CreateDate) As ClaimOrder
FROM Claim c
INNER JOIN other tables...
)
UPDATE Claim
SET IsDuplicate = IIF(cte.ClaimOrder = 1, 0, 1)
FROM Claim c
INNER JOIN cte ON c.ID = cte.ID
The fields that you include in the PARTITION BY indicates what fields need to be identical for two claims to be considered matched. The ORDER BY tell SQL Server assign the earliest claim the order of 1. Everything that doesn't have the order of 1 is a duplicate of something else.

select distinct records where multiple rows exist for one ID based on values in another column

So I'm not exactly sure if my title is correct or misleading. It sounds simple, but I can't figure it out and haven't found a good example.
I want to select the distinct ID's from a table where the ID does not match to a certain code. For instance I have tableA as below:
tableA
ID Code
==== ====
1 AAA
1 BBB
1 CCC
2 AAA
2 DDD
2 EEE
3 BBB
3 GGG
3 HHH
The only result I would like to return is ID 3 since ID 1 and ID 2 match to code 'AAA'.
I've tried:
SELECT disctinct(ID) from tableA where code <> 'AAA'
but this returns ID 1, 2, and 3. I'm not sure if a group by would work either because I don't even want ID 1 and 2 to be returned.
Try using NOT IN:
SELECT ID
FROM TableA
WHERE ID NOT IN(SELECT ID
FROM TableA
WHERE CODE='AAA')
IN determines whether a specified value matches any value in a subquery or a list. Read more here.
Explanation:
Inner query selects all IDs which as CODE=AAA. The outer query will select all IDs which are not in the result return by inner query.
i.e., With the given data, inner query will return (1,2). Outer query will select ID which are not in (1,2) which is ofcourse 3.
This returns all rows for a given id:
select *
from tab as t1
where not exists
(select * from tab as t2
where t1.id = t2.id
and code = 'AAA')
And this just the ids without 'AAA':
select id
from tab
group by id
having count(case when code = 'AAA' then 1 end) = 0

Select data if conditions on two separate rows are met

Consider the following dataset:
id dataid data
1 3095 5
1 3096 9
1 3097 8
2 3095 4
2 3096 9
2 3097 15
Now, in this, the column someid identifies to certain data, so if I see 3095, I know what data the data column represents (name, address, etc.). I need to do a check so that for the group of ids (i.e. 1 and 2) dataid=3095 then data=5 AND dataid=3096 then data=9, and if this is true, the id group will be selected and operations will be done on it.
Edit: Now I use the following SQL query to do the above:
SELECT *
FROM table s0
JOIN table s1 USING (dataid)
JOIN table s2 USING (dataid)
WHERE s1.dataid=359 AND s1.data=5
AND s2.dataid=360 AND s2.data=6;
But how can I get the output from rows to columns. The property values I need are still as key:pair values in rows and I would like them as columns.
So the output for the above would be:
id 3095 3096 3097
1 5 9 8
whereas currently it is returning from the above query:
id dataid data dataid_1 data_1 dataid_2 data_2
1 3095 5 Unnecessary stuff because of JOIN
1 3096 9
1 3097 8
Thanks and sorry if this is confusing.
SELECT id
FROM (SELECT DISTINCT id
FROM table) as ids
WHERE EXISTS (SELECT '1'
FROM table t2
WHERE t2.id = ids.id AND dataid = 3095 AND data = 5)
AND EXISTS (SELECT '1'
FROM table t2
WHERE t2.id = ids.id AND dataid = 3096 AND data = 9)
Also, if you're querying additional tables besides the given one (one with id as a unique key, preferrably), consider including that, to remove the need to use DISTINCT