dynamically select column name that changed - sql

I have a table as shown below.
ID NAME ADDRESS CITY ROLE Date_Modified
1 Tom something austin manager X
2 Tom nothing austin principal Y
3 Tom anything dallas VP Z
How do write a query to select the column name that have changed between entries 1,2 and 3? Currently I am building a report that needs to identify change. This is what I have so far and need to work with it.
I need to be able to detect via stored proc and see output below.
Id ColumnName DateChanged
2 Address Y
2 Role Y
3 Address Z
3 Role Z

If I understood your question correctly, what you need is detecting changes from one row to another and unpivoting the data. Usage of LAG required SQL Server 2012 or more.
;with cte as (
-- LAG for id is used to skip first row from selection
select id, LAG(id, 1) OVER (ORDER BY id) AS OldId,
address, LAG(address, 1) OVER (ORDER BY id) AS OldAddress,
role, LAG(role, 1) OVER (ORDER BY id) AS OldRole,
Date_Modified
from audit_data
)
SELECT id, ColName, data_col, Date_Modified
FROM
(
select id, address, role, Date_Modified
from cte
-- detect any change in monitored data
where ((OldAddress IS NULL OR address <> OldAddress)
OR (OldRole IS NULL OR role <> OldRole))
AND OldId IS NOT NULL
) AS cp
-- unpivot address and role into data_col column
UNPIVOT
(
data_col FOR ColName IN (address, role)
) AS up;
Data used for setup:
-- drop table audit_data
create table audit_data (
id int,
name VARCHAR(100),
address VARCHAR(100),
city varchar(100),
role VARCHAR(100),
Date_Modified DATETIME2
)
insert into audit_data values (1, 'Tom', 'something', 'austin', 'manager', '20150103'),
(2, 'Tom', 'nothing', 'austin', 'principa', '20150205'),
(3, 'Tom', 'anything', 'dallas', 'VP', '20150314')
go
[Edit] SQL 2008R2 version:
;with ad_cte as (
select id, address, role, Date_Modified, ROW_NUMBER() OVER (ORDER BY id) RowNo
from audit_data
),
cte as (
select ad.id,
ad.address, ad_old.address AS OldAddress,
ad.role, ad_old.role AS OldRole,
ad.Date_Modified
from ad_cte ad
join ad_cte ad_old on ad_old.RowNo + 1 = ad.RowNo
)
SELECT id, ColName, data_col, Date_Modified
FROM
(
select id, address, role, Date_Modified
from cte
-- detect any change in monitored data
where ((OldAddress IS NULL OR address <> OldAddress)
OR (OldRole IS NULL OR role <> OldRole))
-- this should be changed for generality
AND cte.id > 1
) AS cp
-- unpivot address and role into data_col column
UNPIVOT
(
data_col FOR ColName IN (address, role)
) AS up;

This is very similar to Alexei's answer:
CREATE TABLE #temp( ID INT IDENTITY(1, 1),
NAME VARCHAR(30),
ADDRESS VARCHAR(30),
CITY VARCHAR(30),
ROLE VARCHAR(30),
Date_Modified DATETIME );
INSERT INTO #temp
SELECT 'Tom',
'something',
'austin',
'manager',
DATEADD(day, -3, GETDATE())
UNION
SELECT 'Tom',
'nothing',
'austin',
'principal',
DATEADD(day, -2, GETDATE())
UNION
SELECT 'Tom',
'anything',
'dallas',
'VP',
DATEADD(day, -1, GETDATE());
SELECT 'Jon',
'something',
'san antonio',
'assistant manager',
DATEADD(day, -3, GETDATE())
UNION
SELECT 'Jon',
'something',
'austin',
'assistant manager',
DATEADD(day, -2, GETDATE())
UNION
SELECT 'Jon',
'anything',
'dallas',
'manager',
DATEADD(day, -1, GETDATE());
SELECT id,
ColName,
Date_Modified
FROM(
SELECT DISTINCT B.ID,
B.Name,
CASE
WHEN A.ADDRESS <> B.ADDRESS
THEN B.ADDRESS
END AS ADDRESS,
CASE
WHEN A.CITY <> B.CITY
THEN B.CITY
END AS CITY,
CASE
WHEN A.ROLE <> B.ROLE
THEN B.ROLE
END AS ROLE,
B.Date_Modified
FROM(
SELECT *,
ROW_NUMBER() OVER(PARTITION BY NAME ORDER BY Date_Modified DESC) AS ROWNUM
FROM #temp ) AS A
INNER JOIN(
SELECT *,
ROW_NUMBER() OVER(PARTITION BY NAME ORDER BY Date_Modified DESC) AS ROWNUM
FROM #temp ) AS B ON A.NAME = B.NAME
AND CHECKSUM(A.NAME, A.ADDRESS, A.CITY, A.ROLE) <> CHECKSUM(B.NAME, B.ADDRESS, B.CITY, B.ROLE)
AND A.ROWNUM = B.ROWNUM - 1 ) AS cp
UNPIVOT( data FOR ColName IN( address,
role )) AS up;

Related

SQL Server SELECT first occurrence OR if no occurrence SELECT other criteria

I am having an issue trying to form the proper SQL query for the job here. I have two tables, one is called CUSTOMER and the other is called CUSTOMER_CONTACT. To simplify this, I will only include the relevant column names.
CUSTOMER columns: ID, CUSTOMERNAME
CUSTOMER_CONTACT columns: ID, CUSTOMER_ID, CONTACT_VC, EMAIL
CUSTOMER_ID is the foreign key to link to the CUSTOMER table from CUSTOMER_CONTACT. CONTACT_VC is just the entry number for their contact information. There could be multiple CUSTOMER_CONTACT records for each customer, but they will have a unique CONTACT_VC.
EMAIL can be null/blank on some or all as well.
I need to select the first CUSTOMER_CONTACT entry where EMAIL is NOT NULL/blank but if none of the CUSTOMER_CONTACT entries have an email address, then select CUSTOMER_CONTACT WHERE CONTACT_VC = 1
Any suggestions on how to accomplish this?
The following approach uses ROW_NUMBER to retrieve a number based on your ordering logic within each CUSTOMER_ID group, then filters by the first record retrieved.
You may try the following:
SELECT
*
FROM (
SELECT
*,
ROW_NUMBER() OVER (
PARTITION BY CUSTOMER_ID
ORDER BY (CASE WHEN EMAIL IS NOT NULL THEN 0 ELSE 1 END),CONTACT_VC
) as rn
FROM
CUSTOMER_CONTACT
) t
WHERE rn=1
If you would like to join this to the customer table you may use the above query as a subquery eg
SELECT
c.*,
contact.*
FROM
CUSTOMER c
INNER JOIN (
SELECT
*,
ROW_NUMBER() OVER (
PARTITION BY CUSTOMER_ID
ORDER BY (CASE WHEN EMAIL IS NOT NULL THEN 0 ELSE 1 END),CONTACT_VC
) as rn
FROM
CUSTOMER_CONTACT
) contact ON c.ID = contact.CUSTOMER_ID and contact.rn=1
Here is almost the same answer as ggordon, but I used a common table expression and I think the ordering in the subquery portion should go by CONTACT_VS first then by non-NULL email addresses. I created some very simple test data to run this:
DECLARE #CUSTOMER AS TABLE
(
[ID] INT NOT NULL,
[CUSTOMERNAME] VARCHAR(10) NOT NULL
);
INSERT INTO #CUSTOMER
(
[ID],
[CUSTOMERNAME]
)
VALUES
(1, 'Alice'),
(2, 'Bob'),
(3, 'Cathy');
DECLARE #CUSTOMER_CONTACT AS TABLE
(
[ID] INT NOT NULL,
[CUSTOMER_ID] INT NOT NULL,
[CONTACT_VC] INT NOT NULL,
[EMAIL] VARCHAR(40) NULL
);
INSERT INTO #CUSTOMER_CONTACT
(
[ID],
[CUSTOMER_ID],
[CONTACT_VC],
[EMAIL]
)
VALUES
(1, 1, 1, 'alice#email.com'),
(2, 1, 2, 'alice#gmail.com'),
(3, 2, 1, NULL),
(4, 2, 2, 'bob#work.com'),
(5, 3, 1, NULL),
(6, 3, 2, NULL),
(7, 3, 3, NULL);
;WITH [cc]
AS (SELECT [ID],
[CUSTOMER_ID],
[CONTACT_VC],
[EMAIL],
ROW_NUMBER() OVER (PARTITION BY [CUSTOMER_ID]
ORDER BY [CONTACT_VC],
(CASE WHEN [EMAIL] IS NOT NULL THEN
0
ELSE
1
END
)
) AS [rn]
FROM #CUSTOMER_CONTACT)
SELECT [c].[ID], [c].[CUSTOMERNAME], [cc].[ID], [cc].[CUSTOMER_ID], [cc].[CONTACT_VC], [cc].[EMAIL]
FROM #CUSTOMER AS [c]
INNER JOIN [cc]
ON [c].[ID] = [cc].[CUSTOMER_ID]
AND [cc].[rn] = 1;
select * from CUSTOMER_CONTACT where EMAIL IS NOT NULL
union all
select * from CUSTOMER_CONTACT where
(CONTACT_VC=1 and NOT EXISTS (select 1 FROM CUSTOMER_CONTACT where EMAIL IS NOT NUL)
order by CONTACT_VC asc limit 1

Insert multiple row different column value

This is my existing table
In this table, each user has their own respective data according to their Status. Each of the user will surely have Status 1.
Now, there are 3 Status to be stored for every user.
Was trying to make every user to have 3 Status, by inserting new row of user copying their Status 1 data, such that:
User Ali currently only have Status 1 and its data, so need insert a new
row Ali with Status 2 and copy along the data from Status 1, again,
insert a new row Ali with Status 3 and copy along the data from
Status 1.
User John currently only have Status 1 and 2, so need insert a new
row John with Status 3 and copy along the data from Status 1.
continue same pattern with other user
Expected result:
I would use CROSS JOIN and NOT EXISTS
with data as
(
select name,
column1,
column2
from your_table
where status = 1
), cross_join_data as
(
select d1.name, t.status, d1.column1, d1.column2
from data d1
cross join
(
select 1 status
union
select 2 status
union
select 3 status
) t
where not exists (
select 1
from your_table d2
where d2.name = d1.name and
d2.status = t.status
)
)
select *
from your_table
union all
select *
from cross_join_data
dbfiddle demo
This should work
with cte as (
select
[Name], coalesce(max(iif([Status]=1, [Column1], null)), max(iif([Status]=2, [Column1], null)), max(iif([Status]=3, [Column1], null))) col1
, coalesce(max(iif([Status]=1, [Column2], null)), max(iif([Status]=2, [Column2], null)), max(iif([Status]=3, [Column2], null))) col2
from
MyTable
group by [Name]
)
--insert into MyTable
select
cte.[Name], nums.n, cte.col1, cte.col2
from
cte
cross join (values (1),(2),(3)) nums(n)
left join MyTable on cte.[Name]=MyTable.[Name] and n=MyTable.[Status]
where
MyTable.[Status] is null
This works if data is not nullable
declare #table table (name varchar(10), status int, data int);
insert into #table values
('a', 1, 2)
, ('a', 2, 5)
, ('a', 3, 7)
, ('b', 1, 5)
, ('b', 2, 6)
, ('c', 1, 3)
select stats.status as statusStats
, tn.name as nameTN
, t.status as statusData, t.name, t.data
, ISNULL(t.data, t1.data) as 'fillInData'
from (values (1),(2),(3)) as stats(status)
cross join (select distinct name from #table) tn
left join #table t
on t.status = stats.status
and t.name = tn.name
join #table t1
on t1.name = tn.name
and t1.status = 1
order by tn.name, stats.status
Here is what I would do:
CREATE TABLE #existingtable (Name VARCHAR(50), Status INT, Column1 VARCHAR (10), Column2 VARCHAR(10));
INSERT INTO #existingtable (Name,Status,Column1,Column2) Values('Ali',1,'100','90');
INSERT INTO #existingtable (Name,Status,Column1,Column2) Values('John',1,'20','200');
INSERT INTO #existingtable (Name,Status,Column1,Column2) Values('John',2,'80','90');
INSERT INTO #existingtable (Name,Status,Column1,Column2) Values('Ming',1,'54','345');
INSERT INTO #existingtable (Name,Status,Column1,Column2) Values('Mei',1,'421','123');
INSERT INTO #existingtable (Name,Status,Column1,Column2) Values('Mei',3,'24','344');
SELECT * FROM #existingtable;
WITH CTE (Name,Column1,Column2)
AS
(
SELECT DISTINCT NAME,COLUMN1,COLUMN2
FROM #existingtable
)
, CTE2 (NAME,Status,Column1,Column2)
AS
(
SELECT NAME,1 AS STATUS,COLUMN1,COLUMN2
FROM CTE
UNION
SELECT NAME,2 AS STATUS,COLUMN1,COLUMN2
FROM CTE
UNION
SELECT NAME,3 AS STATUS,COLUMN1,COLUMN2
FROM CTE
)
INSERT INTO #existingtable (Name,Status,Column1,Column2)
SELECT C.Name,C.Status,C.Column1,C.Column2
FROM CTE2 AS C
LEFT JOIN #existingtable AS E
ON C.NAME = E.Name
AND C.Status = E.Status
WHERE E.Status IS NULL
SELECT * FROM #existingtable
ORDER BY Name, status
This has 2 edits. Initial edit added a where clause to the CTE
Second edit added the values added by the OP

Displaying multiple select in a single cell

Please help me to solve the following issue .
consider i have two tables in a Database
1.employee 2.Details
In employee table data will be
eid ename level
1 x 9th
2 y 10th
In Address Table data will be
AId eid location Adreess_type
1 1 india permananet
2 1 US Temporary
3 2 Japan permananet
4 2 China Temporary
I need output in the below format
eid ename fulllocation
1 X INDIA -US
2 y Japan-CHINA
Try this:
SELECT
e.eid,
e.name,
GROUP_CONCAT(a.location SEPARATOR '-') AS fulllocation
FROM
employee as e
INNER JOIN address as a
ON e.eid = a.eid
GROUP BY
e.eid
select employee.eid, employee.ename, t.fulllocation
from employee
inner join (select eid, group_concat(location SEPARATOR '-') as fulllocation from Address group by eid) t
on employee.eid = t.eid
Consider that GROUP_CONCAT have some limitations, what is this and how can change (if needed) it? please check documentation for this.
DECLARE #t1 TABLE
(
eid int NOT NULL,
ename varchar(50),
level varchar(50)
)
DECLARE #t2 TABLE
(
aid int NOT NULL,
eid int,
location varchar(50),
address_type varchar(50)
)
INSERT INTO #t1 SELECT 1, 'x', '9th'
INSERT INTO #t1 SELECT 2, 'y', '10th'
INSERT INTO #t2 SELECT 1, 1, 'india', 'permanent'
INSERT INTO #t2 SELECT 2, 1, 'US', 'temporary'
INSERT INTO #t2 SELECT 3, 2, 'Japan', 'permanent'
INSERT INTO #t2 SELECT 4, 2, 'China', 'temporary'
SELECT * FROM #t1
SELECT * FROM #t2
SELECT t1.eid, t1.ename, t2.fullLocation
FROM #t1 AS t1
INNER JOIN (
SELECT eid, COUNT(*) AS noofrecs
, fullLocation = LTRIM(RTRIM(ISNULL(STUFF(
(
SELECT DISTINCT '-' + CAST(t2.location as nvarchar(max))
FROM #t2 t2
WHERE t1.eid = t2.eid
FOR XML PATH (''), TYPE).value('.', 'nvarchar(max)'
), 1, 1, ''), '')))
FROM #t2 as t1
GROUP BY eid
) AS t2
ON t1.eid = t2.eid
DECLARE #t1 TABLE
(
eid int NOT NULL,
ename varchar(50),
level varchar(50)
)
DECLARE #t2 TABLE
(
aid int NOT NULL,
eid int,
location varchar(50),
address_type varchar(50)
)
INSERT INTO #t1 SELECT 1, 'x', '9th'
INSERT INTO #t1 SELECT 2, 'y', '10th'
INSERT INTO #t2 SELECT 1, 1, 'india', 'permanent'
INSERT INTO #t2 SELECT 2, 1, 'US', 'temporary'
INSERT INTO #t2 SELECT 3, 2, 'Japan', 'permanent'
INSERT INTO #t2 SELECT 4, 2, 'China', 'temporary'
SELECT * FROM #t1
SELECT * FROM #t2
SELECT b.eid,b.ename
, STUFF((SELECT '_ ' + a.location FROM #t2 A
Where A.eid=B.eid FOR XML PATH('')),1,1,'') As fulllocation
From #t1 B
Group By b.eid,b.ename
In order to preserve the order of the locations, you could work along
SELECT
e.eid
, e.ename
, CONCAT_WS('-', p.location, t.location) AS fulllocation
FROM Employee e
JOIN Address p
ON e.eid = p.eid
AND p.address_type = 'permananet'
JOIN Address t
ON e.eid = t.eid
AND t.address_type = 'Temporary'
;
See it in action: SQL Fiddle.
Please comment if and as this requrires adjustment / further detail.

SQL getting status of a period

I'm looking for a SQL solution for the following problem.
I want a list of employees who are more then 14 days sick in a row.
I've a sql table with the following:
First_name, Last_Name, INDIRECT_ID, SHIFT_DATE
John, Doe, Sick, 2016-01-01
John, Doe, Sick, 2016-01-02
John, Doe, working, 2016-01-03
John, Doe, Sick, 2016-01-04
John, Doe, Sick, 2016-01-05
etc.
I thought to do this by seeing if they are sick for 10x (2x 5 working days) in two weeks. But maybe there is a much simpler solution for it. But Now I'm also getting duplicate answers.
select FIRST_NAME, LAST_NAME
from (select t.*
,(select count(*)
from LABOR_TICKET t2
where t2.EMPLOYEE_ID = t.EMPLOYEE_ID and
t2.INDIRECT_ID = t.INDIRECT_ID and
t2.SHIFT_DATE >= t.SHIFT_DATE and
t2.SHIFT_DATE < DATEADD(day, 14, t.SHIFT_DATE)) NumWithin14Days
from LABOR_TICKET t
where SHIFT_DATE between '2016-01-01' and '2016-04-01'
) LABOR_TICKET
INNER JOIN
EMPLOYEE ON LABOR_TICKET.EMPLOYEE_ID = EMPLOYEE.ID
where NumWithin14Days >= 10 AND INDIRECT_ID = 'SICK'
Try this,
First create all the 14 days intervals in between the From Date and To Date.
Then check the count of the 'Sick' is 14 in each interval for every employee.
DECLARE #ST_DATE DATE='2016-01-01'
,#ED_DATE DATE='2016-04-01'
;WITH CTE_DATE AS (
SELECT #ST_DATE AS ST_DATE,DATEADD(DAY,13,#ST_DATE) AS ED_DATE
UNION ALL
SELECT DATEADD(DAY,1,ED_DATE),DATEADD(DAY,14,ED_DATE)
FROM CTE_DATE
WHERE DATEADD(DAY,14,ED_DATE) <= #ED_DATE
)
SELECT FIRST_NAME, LAST_NAME
FROM CTE_DATE
INNER JOIN LABOR_TICKET ON SHIFT_DATE BETWEEN ST_DATE AND ED_DATE
WHERE INDIRECT_ID = 'Sick'
GROUP BY FIRST_NAME, LAST_NAME
HAVING COUNT(*) >= 14
Pseudo code to give you idea for all employees
if you have a calendar table like below
create table dates
(
datetime date
)
insert into dates
select '2016-01-01'
union all
select '2016-01-02'
Now you can left join this with your main table like
select
mt.firstname,dt.date,count(indirect_id)
from
datestable dt
left join
maintable mt
on mt.date=dt.date
and mt.indirect_id='sick'
group by mt.firstname,dt.date
having count(indirect_id)>=14
order by dt.date
you should have thrown more sample data.
try this,(I am sure it will work with other sample data)few thing are just there to filter data.
declare #t table(First_name varchar(50), Last_Name varchar(50), INDIRECT_ID varchar(50), SHIFT_DATE date)
insert into #t values
('John', 'Doe', 'Sick', '2016-01-01')
,('John', 'Doe', 'Sick', '2016-01-02')
,('John','Doe','working','2016-01-03')
,('John', 'Doe', 'Sick', '2016-01-04')
,('John', 'Doe', 'Sick', '2016-01-05')
declare #name varchar(50)='John'
declare #month int=1
;With CTE as
(
select top 1 First_name,Last_Name,SHIFT_DATE,1 rn from #T where First_name=#name
and INDIRECT_ID='Sick' order by SHIFT_DATE
union all
select t.First_name,t.Last_Name,t.SHIFT_DATE, rn+1 from #T t
inner join cte c on t.First_name=c.First_name
where INDIRECT_ID='Sick'
and t.SHIFT_DATE=DATEADD(day,1,c.SHIFT_DATE)
and t.SHIFT_DATE<='2016-01-31'
)
select * from CTE where rn>=14
declare #t table(First_name varchar(50), Last_Name varchar(50), INDIRECT_ID varchar(50), SHIFT_DATE date)
insert into #t values
('John', 'Doe', 'Sick', '2016-01-01')
,('John', 'Doe', 'Sick', '2016-01-02')
,('John','Doe','working','2016-01-03')
,('John', 'Doe', 'Sick', '2016-04-04')
,('John', 'Doe', 'Sick', '2016-05-05')
select s.*
,u.*
,Sickdays =
case
when s.indirect_id = 'Sick' and u.indirect_id = 'Sick' then datediff(dd,u.shift_date,s.shift_date)
else 0
end
from
(
select t.*,
row_number() over(partition by last_name,first_name order by shift_date desc) rn
from #t t
) s
join
(select t.*,
row_number() over(partition by last_name,first_name order by shift_date desc) rn
from #t t
) u on s.last_name = u.last_name and s.first_name = u.first_name and s.rn = u.rn - 1
where
case
when s.indirect_id = 'Sick' and u.indirect_id = 'Sick' then datediff(dd,u.shift_date,s.shift_date)
else 0
end > 13

Remove duplicates with less null values

I have a table of employees which contains about 25 columns. Right now there are a lot of duplicates and I would like to try and get rid of some of these duplicates.
First, I want to find the duplicates by looking for multiple records that have the same values in first name, last name, employee number, company number and status.
SELECT
firstname,lastname,employeenumber, companynumber, statusflag
FROM
employeemaster
GROUP BY
firstname,lastname,employeenumber,companynumber, statusflag
HAVING
(COUNT(*) > 1)
This gives me duplicates but my goal is to find and keep the best single record and delete the other records. The "best single record" is defined by the record with the least amount of NULL values in all of the other columns. How can I do this?
I am using Microsoft SQL Server 2012 MGMT Studio.
EXAMPLE:
Red: DELETE
Green: KEEP
NOTE: There are a lot more columns in the table than what this table shows.
You can use the sys.columns table to get a list of columns and build a dynamic query. This query will return a 'KeepThese' value for every record you want to keep based on your given criteria.
-- insert test data
create table EmployeeMaster
(
Record int identity(1,1),
FirstName varchar(50),
LastName varchar(50),
EmployeeNumber int,
CompanyNumber int,
StatusFlag int,
UserName varchar(50),
Branch varchar(50)
);
insert into EmployeeMaster
(
FirstName,
LastName,
EmployeeNumber,
CompanyNumber,
StatusFlag,
UserName,
Branch
)
values
('Jake','Jones',1234,1,1,'JJONES','PHX'),
('Jake','Jones',1234,1,1,NULL,'PHX'),
('Jake','Jones',1234,1,1,NULL,NULL),
('Jane','Jones',5678,1,1,'JJONES2',NULL);
-- get records with most non-null values with dynamic sys.column query
declare #sql varchar(max)
select #sql = '
select e.*,
row_number() over(partition by
e.FirstName,
e.LastName,
e.EmployeeNumber,
e.CompanyNumber,
e.StatusFlag
order by n.NonNullCnt desc) as KeepThese
from EmployeeMaster e
cross apply (select count(n.value) as NonNullCnt from (select ' +
replace((
select 'cast(' + c.name + ' as varchar(50)) as value union all select '
from sys.columns c
where c.object_id = t.object_id
for xml path('')
) + '#',' union all select #','') + ')n)n'
from sys.tables t
where t.name = 'EmployeeMaster'
exec(#sql)
Try this.
;WITH cte
AS (SELECT Row_number()
OVER(
partition BY firstname, lastname, employeenumber, companynumber, statusflag
ORDER BY (SELECT NULL)) rn,
firstname,
lastname,
employeenumber,
companynumber,
statusflag,
username,
branch
FROM employeemaster),
cte1
AS (SELECT a.firstname,
a.lastname,
a.employeenumber,
a.companynumber,
a.statusflag,
Row_number()
OVER(
partition BY a.firstname, a.lastname, a.employeenumber, a.companynumber, a.statusflag
ORDER BY (CASE WHEN a.username IS NULL THEN 1 ELSE 0 END +CASE WHEN a.branch IS NULL THEN 1 ELSE 0 END) )rn
-- add the remaining columns in case statement
FROM cte a
JOIN employeemaster b
ON a.firstname = b.firstname
AND a.lastname = b.lastname
AND a.employeenumber = b.employeenumber
AND a.companynumbe = b.companynumber
AND a.statusflag = b.statusflag)
SELECT *
FROM cte1
WHERE rn = 1
I test with MySQL and use NULL String concat to found the best record. Because LENGTH ( NULL || 'data') is 0. Only if all column not NULL some length exists. Maybe this is not perfekt.
create table EmployeeMaster
(
Record int auto_increment,
FirstName varchar(50),
LastName varchar(50),
EmployeeNumber int,
CompanyNumber int,
StatusFlag int,
UserName varchar(50),
Branch varchar(50),
PRIMARY KEY(record)
);
INSERT INTO EmployeeMaster
(
FirstName, LastName, EmployeeNumber, CompanyNumber, StatusFlag, UserName, Branch
) VALUES ('Jake', 'Jones', 1234, 1, 1, 'JJONES', 'PHX'), ('Jake', 'Jones', 1234, 1, 1, NULL, 'PHX'), ('Jake', 'Jones', 1234, 1, 1, NULL, NULL), ('Jane', 'Jones', 5678, 1, 1, 'JJONES2', NULL);
My query idea looks like this
SELECT e.*
FROM employeemaster e
JOIN ( SELECT firstname,
lastname,
employeenumber,
companynumber,
statusflag,
MAX( LENGTH ( username || branch ) ) data_quality
FROM employeemaster
GROUP BY firstname, lastname, employeenumber, companynumber, statusflag
HAVING count(*) > 1
) g
ON LENGTH ( username || branch ) = g.data_quality