How to "Roll-Up" data across multiple columns and rows - sql

I have an Audit table where we record changes to fields in our database. I have a query where I was able to get a subset of the data from the Audit regarding a few columns, their recorded change, and when, associated against the applicable ID's. Here is a sample of what the output looks like:
ID ada IsHD HDF DTStamp
-----------------------------------------------------
68 NULL 0 0 2020-04-28 21:12:21.287
68 NULL NULL NULL 2020-04-17 14:59:49.700
68 No/Unsure NULL NULL 2020-04-17 14:03:46.160
68 NULL 0 0 2020-04-17 13:49:49.720
102 NULL NULL NULL 2020-04-30 13:11:15.273
102 No/Unsure NULL NULL 2020-04-20 16:00:35.410
102 NULL 1 1 2020-04-20 15:59:55.750
105 No/Unsure 1 1 2020-04-17 12:06:10.833
105 NULL NULL NULL 2020-04-13 07:51:30.180
126 NULL NULL NULL 2020-05-01 17:59:24.460
126 NULL 0 0 2020-04-28 21:12:21.287
What I am trying to figure out is the most efficient means to "roll-up" the multiple rows of a given ID so that the newest Non-NULL value is kept, leaving only a single line for that ID.
That is, turn this:
68 NULL 0 0 2020-04-28 21:12:21.287
68 NULL NULL NULL 2020-04-17 14:59:49.700
68 No/Unsure NULL NULL 2020-04-17 14:03:46.160
68 NULL 0 0 2020-04-17 13:49:49.720
102 NULL NULL NULL 2020-04-30 13:11:15.273
102 No/Unsure NULL NULL 2020-04-20 16:00:35.410
102 NULL 1 1 2020-04-20 15:59:55.750
Into this:
68 No/Unsure 0 0 2020-04-28 21:12:21.287
102 No/Unsure 1 1 2020-04-30 13:11:15.273
...and so on down the list. It's almost like you were to push down on the top of the results and squeeze out all the NULLs, as it were.
Dumping the above results into a temp table #audit I then run the following query:
SELECT DISTINCT a.[ID]
, (SELECT TOP 1 [ADA]
FROM #audit
WHERE [ID] = a.[ID]
AND [ADA] IS NOT NULL
ORDER BY [DTStamp] DESC) AS 'ADA'
, (SELECT TOP 1 [IsHD]
FROM #audit
WHERE [ID] = a.[ID]
AND [IsHD] IS NOT NULL
ORDER BY [DTStamp] DESC) AS 'IsHD'
, (SELECT TOP 1 [HDF]
FROM #audit
WHERE [ID] = a.[ID]
AND [HDF] IS NOT NULL
ORDER BY [DTStamp] DESC) AS 'HDF'
, (SELECT Max([DTStamp])
FROM #audit
WHERE [ID] = a.[ID]) AS 'DTStamp'
FROM #audit a
ORDER BY [ID]
This is what I've come up with and it does work, but it feels very klunky and inefficient. Is there a better way to accomplish the end goal?

If you want one row per id, then use aggregation:
select id, max(ada), max(IsHD), max(HDF), max(DTStamp)
from #audit a
group by id;
This works for the data you have provided and seems to fit the rule that you want.

I understand that you want the "latest" non-null value per id for each column, using column DTStampfor ordering.
Your approach using multiple subqueries does what you want would. An alternative be to use multiple row_number()s and conditional aggregation. This might actually be more efficient, since it avoids multiple scans on the table.
select
id,
max(case when rn_ada = 1 then ada end) ada,
max(case when rn_isHd = 1 then isHd end) isHd,
max(case when rn_hdf = 1 then hdf end) hdf,
max(DTStamp) DTStamp
from (
select
a.*,
row_number() over(
partition by id
order by case when ada is not null then DTStamp end desc
) rn_ada,
row_number() over(
partition by id
order by case when isHd is not null then DTStamp end desc
) rn_isHd,
row_number() over(
partition by id
order by case when hdf is not null then DTStamp end desc
) rn_hdf
from #audit a
) t
group by id
order by id
Demo on DB Fiddle:
id | ada | isHd | hdf | DTStamp
--: | :-------- | ---: | --: | :----------------------
68 | No/Unsure | 0 | 0 | 2020-04-28 21:12:21.287
102 | No/Unsure | 1 | 1 | 2020-04-30 13:11:15.273

Related

Update next column in table if previous column value is not null

When a person receives a score, an entry is added into the table #uniqueScores:
Pid | Date | Score
I have a stored procedure returning a table #people with the score columns containing the data from #uniqueScores (that fall within the past 3 months)
Pid | S1 | S2 | S3 | S4 | S5
I have a small test dataset, however I'm having trouble getting any scores beyond the first score registered to a user to appear in Score2 or beyond.
Here is my test dataset
Pid | Date | Score
#1 | 2020/07/01 | 8
#1 | 2020/09/15 | 8
#2 | 2020/09/21 | 3
#3 | 2020/10/01 | 5
#4 | 2020/10/18 | 6
#4 | 2020/10/31 | 2
My update statement, to update the Person column with the data
BEGIN
UPDATE #people
SET [Score5] = (CASE WHEN ( [p].[Score4] is not null and [p].[Score5] is null ) THEN [us].[Score] ELSE NULL END)
,[Score4] = (CASE WHEN ( [p].[Score3] is not null and [p].[Score4] is null ) THEN [us].[Score] ELSE NULL END)
,[Score3] = (CASE WHEN ( [p].[Score2] is not null and [p].[Score3] is null ) THEN [us].[Score] ELSE NULL END)
,[Score2] = (CASE WHEN ( [p].[Score1] is not null and [p].[Score2] is null ) THEN [us].[Score] ELSE NULL END)
,[Score1] = (CASE WHEN ( [p].[Score1] is null ) THEN [us].[Score] ELSE NULL END)
FROM #people [p] inner join #uniqueScores [us]
on [p].[PersonID] = [us].[PersonID]
WHERE [Date] >= #DateLimit -- within the previous 3 months
END
However, the query isn't updating the table with any but the first eligible values. The returned table looks like this
Pid | S1 | S2 | S3 | S4 | S5
#1 | 8 | null | null | null | null
#2 | 3 | null | null | null | null
#3 | 5 | null | null | null | null
#4 | 6 | null | null | null | null
The first table entry which is ineligible to be considered for the table isn't included which is great, however Person #4's second score is also missing.
I've been looking at PIVOT, WHILE and a CURSOR but I've got no closer to making this work. I'm sure I've missed something simple however I just can't see it.
UPDATE updates each row once. Preaggregate for multiple updates:
UPDATE p
SET Score1 = us.score_1,
Score2 = us.score_2,
Score3 = us.score_3,
Score4 = us.score_4,
Score5 = us.score_5
FROM #people [p] inner join
(SELECT us.PersonID,
MAX(CASE WHEN seqnum = 1 THEN Score END) as score_1,
MAX(CASE WHEN seqnum = 2 THEN Score END) as score_2,
MAX(CASE WHEN seqnum = 3 THEN Score END) as score_3,
MAX(CASE WHEN seqnum = 4 THEN Score END) as score_4,
MAX(CASE WHEN seqnum = 5 THEN Score END) as score_5
FROM (SELECT us.*,
ROW_NUMBER() OVER (PARTITION BY PersonId ORDER BY Date) as seqnum
FROM #uniqueScores us
WHERE [Date] >= #DateLimit -- within the previous 3 months
) us
GROUP BY us.PersonID
) s
ON us.PersonID = p.PersonId;
Note: You don't specify what order you want the scores in. This puts the oldest ones first. Use ORDER BY DESC if you want the newer ones first.

Selecting all rows conditionally between 2 arbitrary column values in SQL Server

I've joined a number of tables to get to this table. From this table I need to select all of the b_id values that fall between the start end end values that are not null. There could be multiple start and end values in the table. How can I write a SQL Server query to select all of the b_ids between but not including those rows. So for this example table I would need the b_ids 99396 AND 71828
I tried to find a similar question and found something like this but I don't believe I'm using the correct values where they need to be. Is there another way to do it. I have a solution using a cursor, but I'm trying to find a non cursor solution. My friend told me the responses on here can be brutal if you don't word the question a certain way. Please be easy on me lol.
a_id | b_id | sequence | start | end |
---------+-------+----------+-------+-------+
3675151 | 68882 | 1 | null | null |
3675151 | 79480 | 2 | 79480 | null |
3675151 | 99396 | 3 | null | null |
3675151 | 71828 | 4 | null | null |
3675151 | 28911 | 5 | null | 28911 |
3675151 | 27960 | 6 | null | null |
3675183 | 11223 | 1 | null | null |
3675183 | 77810 | 2 | null | null |
3675183 | 11134 | 3 | null | null |
3675183 | 90909 | 4 | null | null |
Is this what you are looking for
select a_id, b_id, sequence
from
table
where
(a_id,sequence )
in
(select a_id, sequence from table t1
where
sequence >
(select sequence from table t2 where t1.a_id = t2.a_id and start is not null)
and
sequence <
(select sequence from table t3 where t1.a_id = t3.a_id and end is not null)
);
Would it be this?
Mark as answer if yes, if not exemplify otherwise.
create table #table (
a_id int
,b_id int
,c_sequence int
,c_start int
,c_end int
)
insert into #table
values
(3675151 ,68882 , 1 , null , null )
,(3675151 ,79480 , 2 , 79480 , null )
,(3675151 ,99396 , 3 , null , null )
,(3675151 ,71828 , 4 , null , null )
,(3675151 ,28911 , 5 , null , 28911)
,(3675151 ,27960 , 6 , null , null )
,(3675183 ,11223 , 1 , null , null )
,(3675183 ,77810 , 2 , 4343 , null )
,(3675183 ,11134 , 3 , null , null )
,(3675183 ,90939 , 4 , null , 1231 )
select
t.*
from #table t
where
exists (select t1.b_id,t1.c_sequence
from #table t1
where t1.c_start is not null
and t.a_id =t1.a_id and t.c_sequence>t1.c_sequence )
and exists (select t1.b_id,t1.c_sequence
from #table t1
where t1.c_end is not null
and t.a_id =t1.a_id
and t.c_sequence<t1.c_sequence
You can use window functions for this:
select t.*
from (select t.*,
max(case when c_start is not null then c_sequence end) over (partition by a_id order by c_sequence) as last_c_start,
max(case when c_end is not null then c_sequence end) over (partition by a_id order by c_sequence) as last_c_end,
min(case when c_end is not null then c_sequence end) over (partition by a_id order by c_sequence desc) as next_c_end
from t
) t
where c_sequence > last_c_start and
c_sequence < next_c_end and
(last_c_start > last_c_end or last_c_end is null);
Here is a db<>fiddle.
The subquery is returning the previous start and next end. That is pretty simply. The where uses this information. The last condition just checks that the most recent "start" is the one that should be considered.
Note: This does not handle more complicated scenarios like start-->start-->end-->end. If that is a possibility, you should ask another question.
EDIT:
Actually, there is an even easier way:
select t.*
from (select t.*,
count(coalesce(c_start, c_end)) over (partition by a_id order by c_sequence) as counter
from t
) t
where c_start is null and c_end is null and
counter % 2 = 1;
This returns rows where there two values are NULL (to avoid the endpoints) and there are an odd number of non-NULL c_start/c_end values up to that row.

How to get the first non-NULL value in SQL?

I have three columns of data. One column has ID and the second column has a date and the third column has BMI value. I want to create a fourth column that has the first_value based on date(ascending order) from the third column which is not null or avoiding the null.
So far, I have tried first_value in plain form which didn't work. I tried subsetting first_value inside the case when statement as
CASE
WHEN BMI IS NOT NULL THEN (FIRST_VALUE(BMI) OVER PARTITION BY PID ORDER BY DATE))
ELSE 0
END AS FIRSTNOTNULLVALUE_BMI
gave me 0s.
id date BMI
1 2000-01-01 NULL
1 2003-05-01 18.1
1 2002-07-15 25.8
2 2009-09-25 NULL
2 2015-04-18 23.5
Any suggestions??
You can put that CASE in the ORDER BY of the FIRST_VALUE.
Then the null's will be sorted last for that function.
create table test
(
pid int,
pdate date,
BMI decimal(4,1)
)
insert into test (pid, pdate, BMI) values
(1, '2000-01-01', NULL)
, (1, '2003-05-01', 18.5)
, (1, '2002-07-15', 24.9)
, (2, '2009-09-25', NULL)
, (2, '2015-04-18', 21.7)
;
select *
, first_value(BMI) over (partition by pid order by case when BMI is not null then 1 else 2 end, date(pdate)) as firstBMI
from test
order by pid, pdate
pid | pdate | BMI | firstBMI
:-- | :--------- | :--- | :-------
1 | 2000-01-01 | null | 24.9
1 | 2002-07-15 | 24.9 | 24.9
1 | 2003-05-01 | 18.5 | 24.9
2 | 2009-09-25 | null | 21.7
2 | 2015-04-18 | 21.7 | 21.7
db<>fiddle here
You could join your table with a subquery that recovers the first non-null BMI, date-wise:
select
t.*,
x.bmi first_non_null_bmi
from mytable t
cross join (select bmi from mytable where bmi is not null order by date limit 1) x
I think you can do something like this.. maybe will work.. or gives you any idea
THEN (select BIM = row_number() over (partition by BIM order by DATE desc/asc)
from Products)
ELSE..
You could just use a subquery:
select bmi.*,
(select bmi2.bmi
from bmi bmi2
where bmi2.id = bmi.id and bmi2.bmi is not null
order by bmi2.date
limit 1
) as first_bmi
from bmi;

SQL Ranking by blocks

Im sure the answer to this is going to end up being really obvious, but i just cant get this bit of sql to work.
I have a table that has 3 columns in:
User | Date | AchievedTarget
----------------------------------------
1 | 2018-01-01 | 1
1 | 2018-02-01 | 0
1 | 2018-03-01 | 1
1 | 2018-04-01 | 1
1 | 2018-05-01 | 0
I want to add a ranking as follows based on the AchievedTarget column, is this possible with the data in the table above to create the ranking in the table below:
User | Date | AchievedTarget | Rank
----------------------------------------
1 | 2018-01-01 | 1 | 1
1 | 2018-02-01 | 0 | 1
1 | 2018-03-01 | 1 | 1
1 | 2018-04-01 | 1 | 2
1 | 2018-05-01 | 0 | 1
This is a guess, based on that this is actually a gaps and island question. if so, this does result in the second dataset the OP has provided:
CREATE TABLE dbo.TestTable ([User] tinyint, --Avoid using keywords for column names
[date] date, --Avoid using datatypes for column names
AchievedTarget bit);
GO
INSERT INTO dbo.TestTable ([User],[date],AchievedTarget)
VALUES (1,'20180101',1),
(1,'20180201',0),
(1,'20180301',1),
(1,'20180401',1),
(1,'20180501',0);
GO
WITH Grps AS(
SELECT [User],[date],AchievedTarget,
ROW_NUMBER() OVER (ORDER BY [date]) -
ROW_NUMBER() OVER (PARTITION BY AchievedTarget ORDER BY [date]) AS Grp
FROM dbo.TestTable)
SELECT [User],[date],AchievedTarget,
ROW_NUMBER() OVER (PARTITION BY AchievedTarget, Grp ORDER BY [date]) AS [Rank] --Avoid using keywords for column names
FROM Grps
ORDER BY [date]
GO
DROP TABLE dbo.TestTable;
Other method:
with tmp as (
select row_number() over(order by date) ID, *
from dbo.TestTable
)
select f1.*, NbBefore + 1
from tmp f1
outer apply
(
select top 1 f2.ID IDLimit from tmp f2 where f2.ID<f1.ID and f2.AchievedTarget<>f1.AchievedTarget
order by f2.ID desc
) f3
outer apply
(
select count(*) NbBefore from tmp f4 where f4.ID<f1.ID and f4.ID> f3.IDLimit
) f5

SQL: Count() based on column value

I have a table as follows:
CallID | CompanyID | OutcomeID
----------------------------------
1234 | 3344 | 36
1235 | 3344 | 36
1236 | 3344 | 36
1237 | 3344 | 37
1238 | 3344 | 39
1239 | 6677 | 37
1240 | 6677 | 37
I would like to create a SQL script that counts the number of Sales outcomes and the number of all the other attempts (anything <> 36), something like:
CompanyID | SalesCount | NonSalesCount
------------------------------------------
3344 | 3 | 1
6677 | 0 | 2
Is there a way to do a COUNT() that contains a condition like COUNT(CallID WHERE OutcomeID = 36)?
You can use a CASE expression with your aggregate to get a total based on the outcomeId value:
select companyId,
sum(case when outcomeid = 36 then 1 else 0 end) SalesCount,
sum(case when outcomeid <> 36 then 1 else 0 end) NonSalesCount
from yourtable
group by companyId;
See SQL Fiddle with Demo
Something like this:
SELECT companyId,
COUNT(CASE WHEN outcomeid = 36 THEN 1 END) SalesCount,
COUNT(CASE WHEN outcomeid <> 36 THEN 1 END) NonSalesCount
FROM
yourtable
GROUP BY
companyId
should work -- COUNT() counts only not null values.
Yes. Count doesn't count NULL values, so you can do this:
select
COUNT('x') as Everything,
COUNT(case when OutcomeID = 36 then 'x' else NULL end) as Sales,
COUNT(case when OutcomeID <> 36 then 'x' else NULL end) as Other
from
YourTable
Alternatively, you can use SUM, like bluefeet demonstrated.
SELECT
companyId, SalesCount, TotalCount-SalesCount AS NonSalesCount
FROM
(
select
companyId,
COUNT(case when outcomeid = 36 then 1 else NULL end) SalesCount,
COUNT(*) AS TotalCount
from yourtable
group by companyId
) X;
Using this mutually exclusive pattern with COUNT(*)
avoids a (very small) overhead of evaluating a second conditional COUNT
gives correct values if outcomeid can be NULL
Using #bluefeet's SQLFiddle with added NULLs
Knowing COUNT() and SUM() only count non-null values and the following rule:
true or null = true
false or null = null
For fiddling around, you can take Taryn's answer and circumvent CASE altogether in a super-dirty and error-prone way!
select companyId,
sum(outcomeid = 36 or null) SalesCount,
sum(outcomeid <> 36 or null) NonSalesCount
from yourtable
group by companyId;
Forget to add an or null and you'll be counting everything!