Merge with multiple matching conditions - sql

I have to write a t-sql merge statement where I have to meet multiple conditions to match.
Table column names:
ID,
emailaddress,
firstname,
surname,
titile,
mobile,
dob,
accountnumber,
address,
postcode
The main problem here is that, the database I am working with does not have mandatory fields, there is no primary keys to compare, and source table can have duplicates records as well. As a result, there are many combination to check for the duplicates of source table against the target table. My manager have come up with following scenario
We could have data where two people using same email address so emailaddress, firstname and surname match is 100% match (thinking all other columns else are empty)
data where mobile and accountnumber match is 100% match (thinking all other columns else are empty)
title, surname, postcode, dob match is 100% match (thinking all other columns else are empty)
I was given this task where I cannot see the data because I am a new recruit and my employee does not want to me to see this data for the moment. So, I am kind of working with my imagination.
The solution Now, I am thinking rather than checking the existing record of source against target database, I will cleanse the source data using stored procedure statements, where if it meets one duplicate condition then it will skip the next duplicate removing statements and insert the data into target table.
with cte_duplicate1 AS
(
select emailaddress, sname, ROW_NUMBER() over(partition by emailaddress, sname order by emailaddress) as dup1
from DuplicateRecordTable1
)
delete from cte_duplicate1
where dup1>1;
(if the first cte_duplicate1 code was executed then it will skip the cte_duplicate2)
with cte_duplicate2 AS
(
select emailaddress, fname, ROW_NUMBER() over(partition by emailaddress, fname order by emailaddress) as dup2
from DuplicateRecordTable1
)
delete from cte_duplicate2
where dup2>1;
That is the vague plan at the moment. I do not know yet, if it achievable or not.

I have given a job where I cannot see the data because I am new recruit and my employee does not want to me to give me data to work with. So, I am kind of working with my imagination.
Anyway, the main problem here is that, the database I am working with does not have mandatory fields, there is no primary keys to compare, and source table can have duplicates records as well. As a result, there are many combination to check for the duplicates of source table against the target table.
The solution
Now, I am thinking rather than checking the existing record of source against target database, I will cleanse the source data using stored procedure statements, where if it meets one duplicate condition then it will skip the next duplicate removing statements and insert the data into target table.
with cte_duplicate1 AS
(
select emailaddress, sname, ROW_NUMBER() over(partition by emailaddress, sname order by emailaddress) as dup1
from DuplicateRecordTable1
)
delete from cte_duplicate1
where dup1>1;
(if the first cte_duplicate1 code was executed then it will skip the cte_duplicate2)
with cte_duplicate2 AS
(
select emailaddress, fname, ROW_NUMBER() over(partition by emailaddress, fname order by emailaddress) as dup2
from DuplicateRecordTable1
)
delete from cte_duplicate2
where dup2>1;
That is the vague plan at the moment. I do not know yet, if it achievable or not.

Related

query duplicates from multiple matching JSON fields in a row

JSON format is new to me and I am trying to understand how I can apply searches within those JSON strings.
In this situation I have a player table where two critical columns exists. The first column is a standard text Names. The next is a JSON string that holds various player information. For the sake of this question, I need two extracts from it. $.firstname and $.lastname. The challenge for me here is that I need to produce a query that lists any rows where all 3 data match. The function of this is to detect and list players that have duplicate characters.
In my standard query to list all I use.
SELECT
Name,
json_extract(charinfo, '$."firstname"') AS Firstname,
json_extract(charinfo, '$."lastname"') AS Lastname,
FROM players
ORDER BY Firstname;
What I have currently is"
SELECT
Name,
json_extract(charinfo, '$."firstname"') AS Firstname,
json_extract(charinfo, '$."lastname"') AS Lastname,
COUNT(*) AS qty, license
FROM players
GROUP BY NAME, Firstname, Lastname HAVING COUNT(*)> 1
ORDER BY Firstname;
While this works, I would like it to display the duplicate rows, not just count them. And I'm not sure the correct way to make that adjustment.

Appending Data to SQL Server from Access Query Results in Error

I am appending data from an Access query into an existing table in SQL Server (2019) and sometimes NULL values cause a "Record is deleted" msgbox (no error number).
For instance, I have 3 columns (Text1, Text2, Text3) all are nvarchar(255) and Text1 accepts NULL values but sometimes Text2 doesn't... they are literally the same field with the same data. There is absolutely nothing different with the columns in SQL Server nor the fields in the query. This shouldn't be happening.
The other thing is that I made a make-table query off of the query and using that new table instead of the query caused no problems at all! Why is this? and how do I get the query to append data consistently?
I have tried append queries as well as straight up SQL in a DoCmd.RunSQL
The SqlSRV table is connected via custom ODBC string in Linked Table Manager.
From a query; this gives errors:
INSERT INTO tmakContact ( DataAsOf, ContactId, FullName, LoanNum, LoanId, Name, JobTitle, Email, Relationship, Company, Address, CityStateZip, [Number], PhoneNumType )
SELECT DataAsOf, ContactId, FullName, LoanNum, LoanId, Name, JobTitle, Email, Relationship, Company, Address, CityStateZip, [Number], PhoneNumType
FROM qryContact;
When I take out "Relationship" and "PhoneNumType" fields, the INSERT from this query works fine. These two fields come from outer joined tables. These tables are from another SQL Server and database I link to from within Access via custom ODBC string in Linked Table Manager.
From a table which I made in a make table query from qryContact gives no errors!
INSERT INTO tmakContact ( DataAsOf, ContactId, FullName, LoanNum, LoanId, Name, JobTitle, Email, Relationship, Company, Address, CityStateZip, [Number], PhoneNumType )
SELECT DataAsOf, ContactId, FullName, LoanNum, LoanId, Name, JobTitle, Email, Relationship, Company, Address, CityStateZip, [Number], PhoneNumType
FROM tmptblContact;
Originally I just ran DoCmd.OpenQuery "apdContact" which doesn’t work, which is just a saved append query using the same code as above.
SQL for qryContact:
SELECT Now() AS DataAsOf, dbo_Contact.Id AS ContactId, dbo_UserInfo.FullName, dbo_Loaninfo.LoanNum, dbo_ContactLoanLink.LoanId, dbo_Contact.Name, dbo_Contact.JobTitle, dbo_Email.Addr AS Email, Trim([dbo_ContactRelationship]![Descr]) AS Relationship, dbo_Contact.Company, [Add1] & " " & [Add2] AS Address, StrConv([City],3) & ", " & [StateCode] & " " & [Zip] AS CityStateZip, qryPhone.Number, qryPhone.PhoneNumType
FROM ((((dbo_Loaninfo INNER JOIN ((dbo_ContactLoanLink INNER JOIN dbo_Contact ON dbo_ContactLoanLink.ContactId = dbo_Contact.Id) LEFT JOIN dbo_Address ON dbo_Contact.BusAddrAId = dbo_Address.Id) ON dbo_Loaninfo.Id = dbo_ContactLoanLink.LoanId) LEFT JOIN dbo_ContactRelationship ON dbo_ContactLoanLink.ContactRelationshipId = dbo_ContactRelationship.Id) LEFT JOIN dbo_Email ON dbo_Contact.Id = dbo_Email.ContactId) LEFT JOIN qryPhone ON dbo_Contact.Id = qryPhone.ContactId) LEFT JOIN dbo_UserInfo ON dbo_Loaninfo.AssignedUserId = dbo_UserInfo.Id
WHERE (((dbo_Contact.InactiveFlag)="N") AND ((dbo_Loaninfo.LoanStatusId)<>1105) AND ((dbo_Loaninfo.InactiveFlag)="N") AND ((dbo_Loaninfo.PaidOffFlag)="N"));
Hum, does the table in question have any true/false columns - even if not used in your query? (a bit column in the table?).
Double, triple check that the target table in SQL server (no doublt a linked table to Access) has any bit columns, and if yes, MAKE SURE the column has a deafult value (0), for false.
Next up:
You don't mention if the target table in question has a autonumber PK column (I suspect it must have - but do check, and make sure that such a table has a PK).
next up:
Are their any real, or single/double columns in that target table - Again EVEN IF THEY ARE NOT part of your query, make sure such columns have a default setting in sql server (0).
last up:
Add a row version column to the sql server target table. That so called "row version" column in sql is named timestamp. (this is the worlds WORST name, since that timestamp column has ZERO to do with "time" or date or whatever. it is a ACTUAL row version system, and access supports this feature.
It also means that access will not do a column by column compare to the record when doing updates, or inserts. So, try adding a timestamp (aka: row version) column to the target table, and re-link from access.

Retrieving duplicate and original rows from a table using sql query

Say I have a student table with the following fields - student id, student name, age, gender, marks, class.Assume that due to some error, there are multiple entries corresponding to each student. My requirement is to identify the duplicate rows in the table and the filter criterion is the student name and the class.But in the query result, in addition to identifying the duplicate records, I also need to find the original student detail which got duplicated. Is there any method to do this. I went through this answer: SQL: How to find duplicates based on two fields?. But here it only specifies how to find the duplicate rows and not a means to identify the actual row that was duplicated. Kindly throw some light on the possible solution. Thanks.
First of all: if the columns you've listed are all in the same table, it looks like your database structure could use some normalization.
In terms of your question: I'm assuming your StudentID field is a database generated, primary key and so has not been duplicated. (If this is not the case, I think you have bigger problems than just duplicates).
I'm also assuming the duplicate row has a higher value for StudentID than the original row.
I think the following should work (Note: I haven't created a table to verify this so it might not be perfect straight away. If it doesn't it should be fairly close)
select dup.StudentID as DuplicateStudentID
dup.StudentName, dup.Age, dup.Gender, dup.Marks, dup.Class,
orig.StudentID as OriginalStudentId
from StudentTable dup
inner join (
-- Find first student record for each unique combination
select Min(StudentId) as StudentID, StudentName, Age, Gender, Marks, Class
from StudentTable t
group by StudentName, Age, Gender, Marks, Class
) orig on dup.StudentName = orig.StudenName
and dup.Age = orig.Age
and dup.Gender = orig.Gender
and dup.Marks = orig.Marks
and dup.Class = orig.Class
and dup.StudentID > orig.StudentID -- Don't identify the original record as a duplicate

SQL: check one table and enter values into another

I have a very simple MS Access User Table (USER_TABLE) consisting of 3 fields: Customer_Number, User_Name, and Email_Address. I have another table (NEW_USERS) that consist of new requests for Users. It has a User_Status field that is blank by default, and also has the Customer_Number, User_Name, and Email_Address fields.
Half of the new requests that come through are users already existing, so I want to set up a query that will check the USER_TABLE to determine if a new request exists or not, using the Email_Address field checked vs. the Customer_Number field. Complicating this is the fact that 1) Customer_Number is not unique (many Users exists for a single Customer Number) and 2) Users can have multiple accounts for different Customer Numbers. This results in 4 scenarios in the NEW_USERS table when checking vs. the USER_TABLE:
Email_Address does not exist for Customer Number in USER_TABLE (New)
Email_Address exists for Customer Number in USER_TABLE (Existing)
Email_Address does not exist for Customer Number in USER_TABLE, but exists for other Customer Numbers (New-Multi)
Email_Address does exist for Customer Number in USER_TABLE, and also exists for other Customer Numbers (Existing-Multi)
What I would like to do is run these checks and enter the corresponding result (New, Existing, New-Multi or Existing-Multi) into the User_Status field.
This seems like it would be possible. Is it possible to run 4 separate queries to make the updates to NEW_USERS.User_Status?
When you're working in Access, you really need a field that uniquely identifies each record. At the very least, some combination of field, like customerid and email.
Beyond that, since you have a few criteria to satisfy, the easiest way is probably to make a single select statement that compares data between the results of multiple select statements. Look into outer joins for picking the results from one table that are not found in another. Something like -
insert into user_table select customerid, email_address from
(select customerid, email_address from new_users inner join user_table on ...) as expr1,
(select customerid, email_address from new_users outer join user_table on ...) as expr2
where expr1.customerid = expr2.customerid and expr1.new_users = expr2.new_users
I recommend trying out the free stanford course on sql, theres a handy lesson on nesting your select statements - its a good way to get results that fit a lot of criteria. http://class2go.stanford.edu/
As an aside, they do a lot using syntax of 'specifying joins in the where claus' which is increasingly frowned upon, but much easier to understand.

how to use distinct in ms access

I have two tables. Task and Categories.
TaskID is not a primary key as there are duplicate values.When there are multiple contacts are selected for a specific task,taskid and other details will be duplicated.I wrote the query:
SELECT Priority, Subject, Status, DueDate, Completed, Category
FROM Task, Categories
WHERE Categories.CategoryID=Task.CategoryID;
Now as multiple contacts are selected for that task,for the taskid=T4, there are two records(highlighted with gray). I have tried using distinct in ms access 2003 but its not working. I want to display distinct records. (Here there's no requirement to show taskid) If I write :
select priority, distinct(subject), .......
and remaining same as mentioned in above query then its giving me an error. I have tried distinctrow also.But didnt get success. How to get distinct values in ms access?
Okay.Its working this way.
SELECT DISTINCT Task.Priority, Task.Subject, Task.Status, Task.DueDate,
Task.Completed, Categories.Category
FROM Task, Categories
WHERE (((Categories.CategoryID)=[Task].[CategoryID]));
I don't like using SELECT DISTINCT, I have found that it makes my code take longer to compile. The other way I do it is by using GROUP BY.
SELECT Priority, Subject, Status, DueDate, Completed, Category
FROM Task, Categories
WHERE Categories.CategoryID=Task.CategoryID
GROUP BY Subject;
I do not have VBA up at the moment but this should work as well.
Using SELECT DISTINCT will work for you, but a better solution here would be to change your database design.
Duplicate records may lead to inconsistent data. For example, imagine having two different status in different records with the same TaskID. Which one would be right?
A better design would include something like a Task table, a Contact table and an Assignment table, as follows (the fields in brackets are the PK):
Tasks: [TaskID], TaskPriority, Subject, Status, DueDate, Completed, StartDate, Owner, CategoryID, ContactID, ...
Contact: [ID], Name, Surname, Address, PhoneNumber, ...
Assignment: [TaskID, ContactID]
Then, you can retrieve the Tasks with a simple SELECT from the Tasks tables.
And whenever you need to know the contacts assigned to a Tasks, you would do so using the JOIN clause, like this
SELECT T.*, C.*
FROM TaskID as T
INNER JOIN Assignment as A
ON T.TaskID = A.TaskID
INNER JOIN Contac as C
ON A.ContactID = C.ID
Or similar. You can filter, sort or group the results using all of SQL's query power.