Displaying the difference between rows in the same table - sql

I have a table named Employee_audit with following schema,
emp_audit_id
eid
name
salary
1
1
Daniel
1000
2
1
Dani
1000
3
1
Danny
3000
My goal is to write a SQL query which will return in following format, considering the first row also as changed value from null.
columnName
oldValue
newValue
name
null
Daniel
salary
null
1000
name
Daniel
Dani
name
Dani
Danny
salary
1000
3000
I have written the below SQL query,
WITH cte AS
(
SELECT empid,
name,
salary,
rn=ROW_NUMBER()OVER(PARTITION BY empid ORDER BY emp_audit_id)
FROM Employee_audit
)
SELECT oldname=CASE WHEN c1.Name=c2.Name THEN '' ELSE C1.Name END,
newname=CASE WHEN c1.Name=c2.Name THEN '' ELSE C2.Name END,
oldsalary=CASE WHEN c1.salary=c2.salary THEN NULL ELSE C1.salary END,
newsalary=CASE WHEN c1.salary=c2.salary THEN NULL ELSE C2.salary END
FROM cte c1 INNER JOIN cte c2
ON c1.empid=c2.empid AND c2.RN=c1.RN + 1
But it gives the result in following format
oldname
newname
oldsalary
newsalary
Daniel
Dani
null
null
Dani
Danny
1000
3000
Could you please answer me, how can I get the required result.

The lead and lag functions are to help you out.
The "diffs" calculates differences for each column you need to find diff to
with diffs as (
select 'name' colName, emp_audit_id, eid, lag(name, 1, null) over (partition by eid order by emp_audit_id) oldValue, name newValue
from some_table
union all
select 'salary', emp_audit_id, eid, cast(lag(salary, 1, null) over (partition by eid order by emp_audit_id) as varchar), cast(salary as varchar) newValue
from some_table
)
select *
from diffs
where oldValue <> newValue or oldValue is null
order by emp_audit_id, eid

If you give each row a row number in a CTE then join on yourself to the next row you can compare the old and the new values. Unioning the 2 different column names is a bit clunky however, if you needed a more robust solution you might look at pivoting the data.
You also obviously have to convert all values to a common datatype e.g. a string.
declare #Test table (emp_audit_id int, eid int, [name] varchar(32), salary money);
insert into #Test (emp_audit_id, eid, [name], salary)
values
(1, 1, 'Daniel', 1000),
(2, 1, 'Dani', 1000),
(3, 1, 'Danny', 3000);
with cte as (
select emp_audit_id, eid, [name], salary
, row_number() over (partition by eid order by emp_audit_id) rn
from #Test
)
select C.emp_audit_id, 'name' columnName, P.[Name] oldValue, C.[name] newValue
from cte C
left join cte P on P.eid = C.eid and P.rn + 1 = C.rn
where coalesce(C.[name],'') != coalesce(P.[Name],'')
union all
select C.emp_audit_id, 'salary' columnName, convert(varchar(21),P.salary), convert(varchar(21),C.salary)
from cte C
left join cte P on P.eid = C.eid and P.rn + 1 = C.rn
where coalesce(C.salary,0) != coalesce(P.salary,0)
order by C.emp_audit_id, columnName;
Returns:
emp_audit_id
columnName
oldValue
newValue
1
name
NULL
Daniel
1
salary
NULL
1000.00
2
name
Daniel
Dani
3
name
Dani
Danny
3
salary
1000.00
3000.00
I highly encourage you to add DDL+DML (as show above) to all your future questions as it makes it much easier for people to assist.

Related

Select minimum Seq Num by ID

Just trying to write a simple query to get the row (per VENDOR_ID) with the minimum CNTCT_SEQ_NUM where the CONTACT_NAME is not blank.
Here is what I have written:
SELECT VENDOR_ID, MIN(CNTCT_SEQ_NUM) AS CNTCT_SEQ_NUM , CONTACT_NAME
FROM PS_VENDOR_CNTCT
WHERE VENDOR_ID IN ('ERSUT', 'MOOREA')
AND CONTACT_NAME <> ''
GROUP BY CONTACT_NAME, VENDOR_ID
Current Results:
VENDOR_ID CNTCT_SEQ_NUM CONTACT_NAME
ERSUT 19 V Smith
ERSUT 4 T Peterman
ERSUT 2 I GANCE
ERSUT 8 R FISHER
MOOREA 2 S DALY
MOOREA 4 B SLAUTEN
MOOREA 1 N BLAKELY
Expected results would be:
VENDOR_ID CNTCT_SEQ_NUM CONTACT_NAME
ERSUT 2 I GANCE
MOOREA 1 N BLAKELY
Try this-
SELECT A.* FROM PS_VENDOR_CNTCT A
INNER JOIN
(
SELECT VENDOR_ID,MIN(CNTCT_SEQ_NUM) CNTCT_SEQ_NUM
FROM PS_VENDOR_CNTCT
GROUP BY VENDOR_ID
)B ON A.VENDOR_ID = B.VENDOR_ID
AND A.CNTCT_SEQ_NUM = B.CNTCT_SEQ_NUM
A correlated subquery solves this:
select vc.*
from PS_VENDOR_CNTCT vc
where vc.CNTCT_SEQ_NUM = (select min(vc2.CNTCT_SEQ_NUM)
from PS_VENDOR_CNTCT vc2
where vc2.VENDOR_ID = vc.VENDOR_ID and
vc2.CONTACT_NAME <> ''
);
For performance, you can try an index on (VENDOR_ID, CONTACT_NAME, CNTCT_SEQ_NUM). This covers the subquery, although all the index records will still need to be scanned.
Seems you do not need MIN(), but Window Analytic Function such as ROW_NUMBER()
SELECT DISTINCT Q.VENDOR_ID, Q.CONTACT_NAME, Q.CNTCT_SEQ_NUM
FROM
(
SELECT P.*,
ROW_NUMBER() OVER
(PARTITION BY VENDOR_ID ORDER BY CNTCT_SEQ_NUM) AS RN
FROM PS_VENDOR_CNTCT P
WHERE VENDOR_ID IN ('ERSUT', 'MOOREA')
AND CONTACT_NAME <> ''
) Q
WHERE Q.RN = 1

Convert rows to constant columns in SQL Server?

I have the following query with its result:
SELECT * FROM dbo.DeviceView AS dv
WHERE DeviceId = 5
Result:
Id Name AttachId ColorId Date
--- ------- ---------- ------- -------
5 Apple iPhone 5s A1533 NULL 1 2013-09-10 00:00:00.000
5 Apple iPhone 5s A1533 NULL 8 2013-09-10 00:00:00.000
5 Apple iPhone 5s A1533 NULL 19 2013-09-10 00:00:00.000
ColorId is within diffrent values and it can be more or less than 3 values
I want to convert ColorId to 3 columns, such as first value in ColorId1 and second value in ColorId2 and third value in ColorId3.
eg:
Id Name AttachId ColorId1 ColorId2 ColorId3 Date
--- ------- ---------- ---------- ---------- ---------- -------
5 Apple iPhone 5s A1533 NULL 1 8 19 2013-09-10 00:00:00.000
How can I convert it to the following?
Edit:
All other fields except ColorId are the same.
You can use this generic query based on PIVOT and CTE's that can easily be extended for any number of colors and that performs very well:
-- First, we assign unique numbers to each of the ColorId's. These will become column names
;WITH NumberedColors (ColorId,ColorNumber) AS (
SELECT ColorId,'Color'+CAST((ROW_NUMBER() OVER (ORDER BY ColorId)) AS VARCHAR) AS ColorNumber
FROM dbo.DeviceView
GROUP BY ColorId
),
-- Here we return the dbo.DeviceView extended with ColorNumber column name
DeviceViewWithNumberedColors (Id,Name,AttachId,[Date],ColorNumber,ColorId) AS (
SELECT Id,Name,AttachId,Date,NC.ColorNumber,NC.ColorId
FROM dbo.DeviceView DV
INNER JOIN NumberedColors NC ON DV.ColorId=NC.ColorId
)
-- Finally, we use the PIVOT to assign color's to the appropriate columns
SELECT *
FROM (
SELECT Id,Name,AttachId,[Date],ColorId,ColorNumber
FROM DeviceViewWithNumberedColors D
) AS Source
PIVOT (
SUM(ColorId) FOR ColorNumber IN ([Color1],[Color2],[Color3],[Color4],[Color5],[Color6],[Color7],[Color8],[Color9],[Color10])
) Piv
In the PIVOT clause, make sure you have enough Color columns. If this can not be hard-coded, i.e. the number of colors could grow beyond a fixed number, then use dynamic SQL to generate this query.
If you want exactly three colors, you can use conditional aggregation or pivot:
select id, name, attachid,
max(case when seqnum = 1 then color end) as color1,
max(case when seqnum = 2 then color end) as color2,
max(case when seqnum = 3 then color end) as color3,
date
from (select t.*,
row_number() over (partition by id order by (select null)) as seqnum
from t
) t
group by id, name, attachid, date;
If count of ColorId values is not a fixed number you must use dynamic sql.
DECLARE #sql nvarchar(MAX) = 'SELECT Id, Name, AttachId'
SELECT #sql = #sql
+ ', MAX(IIF(ValueNum = ' + LTRIM(STR(ColumnNum)) + ', ColorId, NULL)) AS ColorId' + LTRIM(STR(ColumnNum))
FROM (
SELECT DISTINCT
DENSE_RANK() OVER (ORDER BY ColorId) AS ColumnNum
FROM dbo.DeviceView
WHERE Id IN (
SELECT TOP 1 Id
FROM dbo.DeviceView
GROUP BY Id
ORDER BY COUNT(DISTINCT ColorId) DESC)
) AS c
EXEC (#sql
+ ', [Date] FROM ('
+ 'SELECT Id, Name, AttachId, ColorId, [Date], DENSE_RANK() OVER (PARTITION BY Id ORDER BY ColorId) AS ValueNum FROM dbo.DeviceView'
+ ') AS it '
+ 'GROUP BY Id, Name, AttachId, [Date]');

How to make a select statement to return "NULLs" if the value is a repetition in SQL

Lets take we have:
SELECT Name, Surname, Salary, TaxPercentage
FROM Employees
returns:
Name |Surname |Salary |TaxPercentage
--------------------------------------
Moosa | Jacobs | $14000 | 13.5
Temba | Martins | $15000 | 13.5
Jack | Hendricks | $14000 | 13.5
I want it to return:
Name |Surname | Salary |TaxPercentage
-------------------------------------------
Moosa | Jacobs | $14000 | NULL
Temba | Martins | $15000 | NULL
Jack | Hendricks| $14000 | 13.5
Since TaxPercentage's value is repeated, I want it appear only once at the end.
In sql server 2012 and above you can use the Lead window function to get the value of the next row. Assuming you have some way to sort the data (like an identity column), you can use this to your advantage:
SELECT Name,
Surname,
Salary,
CASE WHEN TaxPercentage = LEAD(TaxPercentage) OVER (ORDER BY Id) THEN
NULL
ELSE
TaxPercentage
END As TaxPercentage
FROM Employees
ORDER BY Id
See fiddle (thanks to Lasse V. Karlsen)
You should have some way to order the data in order. In my example, I am using simple IDENTITY column, in your it could be primary key or date:
DECLARE #DataSource TABLE
(
[Name] VARCHAR(12)
,[Surname] VARCHAR(12)
,[Salary] VARCHAR(12)
,[TaxPercentage] DECIMAL(9,1)
--
,[RowID] TINYINT IDENTITY(1,1)
);
INSERT INTO #DataSource ([Name], [Surname], [Salary], [TaxPercentage])
VALUES ('Moosa', 'Jacobs', '$14000', '13.5')
,('Temba', 'Martins', '$15000', '13.5')
,('Jack', ' Hendricks', '$14000', '13.5')
,('Temba', 'Martins', '$15000', '1.5')
,('Jack', ' Hendricks', '$14000', '1.5')
,('Temba', 'Martins', '$15000', '23')
,('Jack', ' Hendricks', '$14000', '7')
,('Temba', 'Martins', '$15000', '7')
,('Jack', ' Hendricks', '$14000', '7')
SELECT [Name]
,[Surname]
,[Salary]
,[TaxPercentage]
,NULLIF([TaxPercentage], LEAD([TaxPercentage], 1, NULL) OVER (ORDER BY [RowID])) AS [NewTaxPercentage]
FROM #DataSource;
I need a column to sort rows like Id with identity column
;with cte as (
SELECT
Id, Name, Surname, Salary, TaxPercentage,
LEAD(TaxPercentage, 1, NULL) OVER (ORDER BY Id) AS NextValue
FROM Employees
)
select
Id, Name, Surname, Salary,-- TaxPercentage,
TaxPercentage = CASE WHEN TaxPercentage = NextValue THEN NULL ELSE TaxPercentage END
from cte
Please check SQL Lag() and Lead() functions for more detail on these new analytical functions
If for some reason you can't use LEAD() then this should work:
with T as (
SELECT
Name, Surname, Salary, TaxPercentage,
row_number() over (order by TaxPercentage /* ??? */) as rn
FROM Employees
)
select
Name, Surname, Salary,
nullif(
TaxPercentage,
(select t2.rn from T as t2 where t2.rn = t.rn + 1)
) as TaxPercentage
from T as t
Work with SQL Server >= 2008 if needed
http://sqlfiddle.com/#!3/ec020/1/0
Select o.Name, o.Surname, o.Salary
, TaxPercentage = case when o.id = 1 then o.TaxPercentage else null end
From (
Select Name, Surname, Salary, TaxPercentage
, id = row_number() over(partition by TaxPercentage order by Name, surname, Salary) -- update order...
From Employees as e
) as o
order by o.TaxPercentage, o.id desc

Transpose a table using Oracle SQL

I have some data in a table which I want to transpose using SQL. here is the sample data.
create table test_pivot(
Name varchar2(100),
DeptA varchar2(50),
DeptB varchar2(50),
DeptC varchar2(50),
DeptD varchar2(50)
);
insert all
into test_pivot(Name,DeptA,DeptB,DeptC,DeptD)
values('Asfakul','Y',NULL,NULL,NULL)
into test_pivot(Name,DeptA,DeptB,DeptC,DeptD)
values('Debmalya',NULL,'Y',NULL,NULL)
into test_pivot(Name,DeptA,DeptB,DeptC,DeptD)
values('Ranjan',NULL,NULL,'Y',NULL)
into test_pivot(Name,DeptA,DeptB,DeptC,DeptD)
values('santanu',NULL,NULL,NULL,'Y')
select 1 from dual;
I want the data to be displayed like below..
I am having a tough time figuring it out. please let me know.
Here an SELECT statement without PIVOT and UNPIVOT. As you can see, it's far more complex:
select dept,
nvl(max(case when name = 'Asfakul' then dept_val end), 'N') as Asfakul,
nvl(max(case when name = 'Debmalya' then dept_val end), 'N') as Debmalya,
nvl(max(case when name = 'Ranjan' then dept_val end), 'N') as Ranjan,
nvl(max(case when name = 'santanu' then dept_val end), 'N') as santanu
from(select name,
dept,
case when dept = 'depta' then depta
when dept = 'deptb' then deptb
when dept = 'deptc' then deptc
when dept = 'deptd' then deptd
end dept_val
from test_pivot
join(select 'depta' as dept from dual union all
select 'deptb' as dept from dual union all
select 'deptc' as dept from dual union all
select 'deptd' as dept from dual
)
on 1 = 1
)
group
by dept
order
by dept
If your DB version supports pivot and unpivot then you can use the same.
See the below query, I think this should help you..
SELECT *
FROM( SELECT *
FROM test_pivot
UNPIVOT (Check_val FOR DEPT IN (DEPTA, DEPTB, DEPTC, DEPTD))
)
PIVOT(MAX(check_val) FOR NAME IN ('Asfakul' AS Asfakul,
'Debmalya' AS Debmalya,
'Ranjan' AS Ranjan,
'santanu' AS santanu))
ORDER BY dept;

Tricky SQL. Consolidating rows

I have a (in my oppinion) tricky SQL problem.
I got a table with subscriptions. Each subscription has an ID and a set of attributes which will change over time. When an attribute value changes a new row is created with the subscription key and the new values – but ONLY for the changed attributes. The values for the attributes that weren’t changed are left empty. It looks something like this (I left out the ValidTo and ValidFrom dates that I use to sort the result correctly):
SubID Att1 Att2
1 J
1 L
1 B
1 H
1 A H
I need to transform this table so I can get the following result:
SubID Att1 Att2
1 J
1 J L
1 B L
1 B H
1 A H
So basically; if an attribute is empty then take the previous value for that attribute.
Anything solution goes…. I mean it doesn’t matter what I have to do to get the result: a view on top of the table, an SSIS package to create a new table or something third.
You can do this with a correlated subquery:
select t.subid,
(select t2.att1 from t t2 where t2.rowid <= t.rowid and t2.att1 is not null order by rowid desc limit 1) as att1,
(select t2.att2 from t t2 where t2.rowid <= t.rowid and t2.att2 is not null order by rowid desc limit 1) as att1
from t
This assumes that you have a rowid or equivalent (such as date time created) that specifies the ordering of the rows. It also uses limit to limit the results. In other databases, this might use top instead. (And Oracle uses a slightly more complex expression.)
I would write this using ValidTo. However, because there is ValidTo and ValidFrom, the actual expression is much more complicated. I would need for the question to clarify the rules for using these values with respect to imputing values at other times.
this one works in oracle 11g
select SUBID
,NVL(ATT1,LAG(ATT1) over(order by ValidTo)) ATT1
,NVL(ATT2,lag(ATT2) over(order by ValidTo)) ATT2
from table_name
i agree with Gordon Linoff and Jack Douglas.this code has limitation as when multiple records with nulls are inserted..
but below code will handle that..
select SUBID
,NVL(ATT1,LAG(ATT1 ignore nulls) over(order by VALIDTO)) ATT1
,NVL(ATT2,LAG(ATT2 ignore nulls) over(order by VALIDTO)) ATT2
from Table_name
please see sql fiddle
http://sqlfiddle.com/#!4/3b530/4
Assuming (based on the fact that you mentioned SSIS) you can use OUTER APPLY to get the previous row:
DECLARE #T TABLE (SubID INT, Att1 CHAR(1), Att2 CHAR(2), ValidFrom DATETIME);
INSERT #T VALUES
(1, 'J', '', '20121201'),
(1, '', 'l', '20121202'),
(1, 'B', '', '20121203'),
(1, '', 'H', '20121204'),
(1, 'A', 'H', '20121205');
SELECT T.SubID,
Att1 = COALESCE(NULLIF(T.att1, ''), prev.Att1, ''),
Att2 = COALESCE(NULLIF(T.att2, ''), prev.Att2, '')
FROM #T T
OUTER APPLY
( SELECT TOP 1 Att1, Att2
FROM #T prev
WHERE prev.SubID = T.SubID
AND prev.ValidFrom < t.ValidFrom
ORDER BY ValidFrom DESC
) prev
ORDER BY T.ValidFrom;
(I've had to add random values for ValidFrom to ensure the order by is correct)
EDIT
The above won't work if you have multiple consecutive rows with blank values - e.g.
DECLARE #T TABLE (SubID INT, Att1 CHAR(1), Att2 CHAR(2), ValidFrom DATETIME);
INSERT #T VALUES
(1, 'J', '', '20121201'),
(1, '', 'l', '20121202'),
(1, 'B', '', '20121203'),
(1, '', 'H', '20121204'),
(1, '', 'J', '20121205'),
(1, 'A', 'H', '20121206');
If this is likely to happen you will need two OUTER APPLYs:
SELECT T.SubID,
Att1 = COALESCE(NULLIF(T.att1, ''), prevAtt1.Att1, ''),
Att2 = COALESCE(NULLIF(T.att2, ''), prevAtt2.Att2, '')
FROM #T T
OUTER APPLY
( SELECT TOP 1 Att1
FROM #T prev
WHERE prev.SubID = T.SubID
AND prev.ValidFrom < t.ValidFrom
AND COALESCE(prev.Att1 , '') != ''
ORDER BY ValidFrom DESC
) prevAtt1
OUTER APPLY
( SELECT TOP 1 Att2
FROM #T prev
WHERE prev.SubID = T.SubID
AND prev.ValidFrom < t.ValidFrom
AND COALESCE(prev.Att2 , '') != ''
ORDER BY ValidFrom DESC
) prevAtt2
ORDER BY T.ValidFrom;
However, since each OUTER APPLY is only returning one value I would change this to a correlated subquery, since the above will evaluate PrevAtt1.Att1 and `PrevAtt2.Att2' for every row whether required or not. However if you change this to:
SELECT T.SubID,
Att1 = COALESCE(
NULLIF(T.att1, ''),
( SELECT TOP 1 Att1
FROM #T prev
WHERE prev.SubID = T.SubID
AND prev.ValidFrom < t.ValidFrom
AND COALESCE(prev.Att1 , '') != ''
ORDER BY ValidFrom DESC
), ''),
Att2 = COALESCE(
NULLIF(T.att2, ''),
( SELECT TOP 1 Att2
FROM #T prev
WHERE prev.SubID = T.SubID
AND prev.ValidFrom < t.ValidFrom
AND COALESCE(prev.Att2 , '') != ''
ORDER BY ValidFrom DESC
), '')
FROM #T T
ORDER BY T.ValidFrom;
The subquery will only evaluate when required (ie. when Att1 or Att2 is blank) rather than for every row. The execution plan does not show this, and in fact the "Actual Execution Plan" of the latter appears more intensive it almost certainly won't be. But as always, the key is testing, run both on your data and see which performs the best, and check the IO statistics for reads etc.
I never touched SQL Server, but I read that it supports analytical functions just like Oracle.
> select * from MYTABLE order by ValidFrom;
SUBID A A VALIDFROM
---------- - - -------------------
1 J 2012-12-06 15:14:51
2 j 2012-12-06 15:15:20
1 L 2012-12-06 15:15:31
2 l 2012-12-06 15:15:39
1 B 2012-12-06 15:15:48
2 b 2012-12-06 15:15:55
1 H 2012-12-06 15:16:03
2 h 2012-12-06 15:16:09
1 A H 2012-12-06 15:16:20
2 a h 2012-12-06 15:16:29
select
t.SubID
,last_value(t.Att1 ignore nulls)over(partition by t.SubID order by t.ValidFrom rows between unbounded preceding and current row) as Att1
,last_value(t.Att2 ignore nulls)over(partition by t.SubID order by t.ValidFrom rows between unbounded preceding and current row) as Att2
,t.ValidFrom
from MYTABLE t;
SUBID A A VALIDFROM
---------- - - -------------------
1 J 2012-12-06 15:45:33
1 J L 2012-12-06 15:45:41
1 B L 2012-12-06 15:45:49
1 B H 2012-12-06 15:45:58
1 A H 2012-12-06 15:46:06
2 j 2012-12-06 15:45:38
2 j l 2012-12-06 15:45:44
2 b l 2012-12-06 15:45:53
2 b h 2012-12-06 15:46:02
2 a h 2012-12-06 15:46:09
with Tricky1 as (
Select SubID, Att1, Att2, row_number() over(order by ValidFrom) As rownum
From Tricky
)
select T1.SubID, T1.Att1, T2.Att2
from Tricky1 T1
cross join Tricky1 T2
where (ABS(T1.rownum-T2.rownum) = 1 or (T1.rownum = 1 and T2.rownum = 1))
and T1.Att1 is not null
;
Also, have a look at accessing previous value, when SQL has no notion of previous value, here.
I was at it for quite a while now. I found a rather simple way of doing it. Not the best solution as such as i know there must be other way, but here it goes.
I had to consolidates duplicates too and in 2008R2.
So if you can try to create a table which contains one set of duplicates records.
According to your example create one table where 'ATT1' is blank. Then use Update queries with Inner join on 'SubId' to populate the data that you need