Combine subqueries without views - sql

I work with languages where I can assign intermediate outputs to a variable and then work the with variables to create a final output. I know SQL doesn't work this way as much. Currently I have queries that require me to make subsets of tables and then I want to join those subsets together. I can mimic the variable assignment I do in my native languages using a VIEW but I want to know how to do this using a single query (otherwise the database will get messy with views quickly).
Below is a MWE to make 2 initial tables DeleteMe1 and DeleteMe2 (at the end). Then I'd use these two views to get current snapshots of each table. Last I'd use LEFT JOIN with the views to merge the 2 data sets.
Is there a way to see the code SQL uses on the Join Snapshoted Views header code I supply below
How could I eliminate the views intermediate step and combine into a single SQL query?
Create views for current snapshot:
CREATE VIEW [dbo].[CurrentSnapshotDeleteMe1]
AS
SELECT DISTINCT *
FROM
(SELECT
t.[Id]
,t.[OppId]
,t.[LastModifiedDate]
,t.[Stage]
FROM
[dbo].DeleteMe1 as t
INNER JOIN
(SELECT
[OppId], MAX([LastModifiedDate]) AS MaxLastModifiedDate
FROM
[dbo].DeleteMe1
WHERE
LastModifiedDate <= GETDATE()
GROUP BY
[OppId]) AS referenceGroup ON t.[OppId] = referenceGroup.[OppId]
AND t.[LastModifiedDate] = referenceGroup.[MaxLastModifiedDate]) as BigGroup
GO
CREATE VIEW [dbo].[CurrentSnapshotDeleteMe2]
AS
SELECT DISTINCT *
FROM
(SELECT
t.[Id]
,t.[OppId]
,t.[LastModifiedDate]
,t.[State]
FROM
[dbo].DeleteMe2 AS t
INNER JOIN (
SELECT [OppId], MAX([LastModifiedDate]) AS MaxLastModifiedDate
FROM [dbo].DeleteMe2
WHERE LastModifiedDate <= GETDATE()
GROUP BY [OppId]
) as referenceGroup
ON t.[OppId] = referenceGroup.[OppId] AND t.[LastModifiedDate] = referenceGroup.[MaxLastModifiedDate]
) as BigGroup
GO
Join snapshoted views:
SELECT
dm1.[Id] as IdDM1
,dm1.[OppId]
,dm1.[LastModifiedDate] as LastModifiedDateDM1
,dm1.[Stage]
,dm2.[Id] as IdDM2
,dm2.[LastModifiedDate] as LastModifiedDateDM2
,dm2.[State]
FROM [dbo].[CurrentSnapshotDeleteMe1] as dm1
LEFT JOIN [dbo].[CurrentSnapshotDeleteMe2] as dm2 ON dm1.OppId = dm2.OppId
Create original tables:
CREATE TABLE DeleteMe1
(
[Id] INT,
[OppId] INT,
[LastModifiedDate] DATE,
[Stage] VARCHAR(250),
)
INSERT INTO DeleteMe1
VALUES ('1', '1', '2019-04-01', 'A'),
('2', '1', '2019-05-01', 'E'),
('3', '1', '2019-06-01', 'B'),
('4', '2', '2019-07-01', 'A'),
('5', '2', '2019-08-01', 'B'),
('6', '3', '2019-09-01', 'C'),
('7', '4', '2019-10-01', 'B'),
('8', '4', '2019-11-01', 'C')
CREATE TABLE DeleteMe2
(
[Id] INT,
[OppId] INT,
[LastModifiedDate] DATE,
[State] VARCHAR(250),
)
INSERT INTO DeleteMe2
VALUES (' 1', '1', '2018-07-01', 'California'),
(' 2', '1', '2017-11-01', 'Delaware'),
(' 3', '4', '2017-12-01', 'California'),
(' 4', '2', '2018-01-01', 'Alaska'),
(' 5', '4', '2018-02-01', 'Delaware'),
(' 6', '2', '2018-09-01', 'Delaware'),
(' 7', '3', '2018-04-01', 'Alaska'),
(' 8', '1', '2018-05-01', 'Hawaii'),
(' 9', '4', '2018-06-01', 'California'),
('10', '1', '2018-07-01', 'Connecticut'),
('11', '2', '2018-08-01', 'Delaware'),
('12', '2', '2018-09-01', 'California')

I work with languages where I can assign intermediate outputs to a variable and then work the with variables to create a final output. I know SQL doesn't work this way as much.
Well, that's not true, sql does work this way, or at least sql-server does. You have temp tables and table variables.
Although you named your tables DeleteMe, from your statements it seems like it's the views you wish to treat as variables. So I'll focus on this.
Here's how to do it for your first view. It puts the results into a temporary table called #tempData1:
-- Optional: In case you re-run before you close your connection
if object_id('tempdb..#snapshot') is not null
drop table #snapshot1;
select
distinct t.Id, t.OppId, t.LastModifiedDate, t.Stage
into #snapshot1
from dbo.DeleteMe1 as t
inner join (
select OppId, max(LastModifiedDate) AS MaxLastModifiedDate
from dbo.DeleteMe1
where LastModifiedDate <= getdate()
group by OppId
) referenceGroup
on t.OppId = referenceGroup.OppId
and t.LastModifiedDate = referenceGroup.MaxLastModifiedDate;
The hashtag tells sql server that the table is to be stored temporarially. #tempTable1 will not survive when your connection closes.
Alternatively, you can create a table variable.
declare #snapshot1 table (
id int,
oppId int,
lastModifiedDate date,
stage varchar(50)
);
insert #snapshot1 (id, oppId, lastModifiedDate, stage)
select distinct ...
This table is discarded as soon as the query has finished executing.
From there, you can join on your temp tables:
SELECT dm1.[Id] as IdDM1, dm1.[OppId],
dm1.[LastModifiedDate] as LastModifiedDateDM1, dm1.[Stage],
dm2.[Id] as IdDM2, dm2.[LastModifiedDate] as LastModifiedDateDM2,
dm2.[State]
FROM #snapshot1 dm1
LEFT JOIN #snapshot2 dm2 ON dm1.OppId = dm2.OppId
Or your table variables:
From there, you can join on your temp tables:
SELECT dm1.[Id] as IdDM1, dm1.[OppId],
dm1.[LastModifiedDate] as LastModifiedDateDM1, dm1.[Stage],
dm2.[Id] as IdDM2, dm2.[LastModifiedDate] as LastModifiedDateDM2,
dm2.[State]
FROM #snapshot1 dm1
LEFT JOIN #snapshot2 dm2 ON dm1.OppId = dm2.OppId

Related

Most efficient way to update table column based on sum

I am looking for the most efficient / minimal code way to update a table column based on the sum of another value in the same table. A method which works and the temp table are shown below.
if object_id('tempdb..#t1') is not null drop table #t1
CREATE TABLE #t1 (id nvarchar(max), astate varchar(16), code varchar(16), price decimal(16,2), total_id_price_bystate decimal(16,2), total_id_price decimal(16,2))
INSERT into #t1 VALUES
(100, 'CA', '0123', 123.01, null, null),
(100, 'CA', '0124', 0.00, null, null),
(100, 'PA', '0256', 12.10, null, null),
(200, 'MA', '0452', 145.00, null, null),
(300, 'MA', '0578', 134.23, null, null),
(400, 'CA', '1111', 94.12, null, null),
(600, 'CA', '0000', 86.34, null, null),
(500, 'CO', '1111', 0.00, null, null);
update t1
set total_id_price_bystate = sum_price_bystate
from #t1 t1
inner join (
select t2_in.Id,
t2_in.astate,
sum(t2_in.price) as sum_price_bystate
from #t1 t2_in
group by t2_in.id, t2_in.astate
) t2
on t1.id = t2.id
and t1.astate = t2.astate
update t1
set total_id_price = sum_price
from #t1 t1
inner join (
select t3_in.Id,
sum(t3_in.price) as sum_price
from #t1 t3_in
group by t3_in.id
) t3
on t1.id = t3.id
select * from #t1
The main thing I don't like about my method is that it requires an inner join with a subquery that requires the same table itself. So I am looking for a way that might be able to avoid this, although I don't think this method I have is overly complicated. Maybe there isn't any method too much more efficient.
To add, I am wondering what the best way would be to combine the two updates together, since they are very similar, but only differ by the group by clause.
As pointed out in the comments, this is not a good way to store data as it violates the basic principles of normalisation -
you are storing data that you can compute
you are storing the same data multiple times, ie, duplicates.
you need to re-calculate the totals whenever any individual values changes
it's possible to update a single row and create a data contradiction
it's also not a bad thing to pre-calculate aggregations, especially in a data warehouse scenario, but you would still only store the value once per unique key.
Normalisation prevents these issues.
Saying that, you can utilise analytic window functions to compute your values in a single pass over the table:
select *,
Sum(price) over(partition by id, astate) total_id_price_bystate,
Sum(price) over(partition by id) total_id_price
from #t1;
If you really want the data in this format you could create a view and query it:
create view Totals as
select id, astate, code, price, total_id_price_bystate, total_id_price,
Sum(price) over(partition by id, astate) total_bystate,
Sum(price) over(partition by id) total
from t1;
select *
from Totals where id = 100;
And to answer your specific question, a view (or a CTE) that touches a single base table can be updated so you can accomplish what you are doing like so:
drop view Totals;
create view Totals as
select id, astate, code, price, total_id_price_bystate, total_id_price,
Sum(price) over(partition by id, astate) total_bystate,
Sum(price) over(partition by id) total
from t1;
update totals set
total_id_price_bystate = total_bystate,
total_id_price = total;
You can use PARTITION BY to get the two different aggregated value,
if object_id('tempdb..#t1') is not null drop table #t1
CREATE TABLE #t1 (id nvarchar(max), astate varchar(16), code varchar(16), price decimal(16,2), total_id_price_bystate decimal(16,2), total_id_price decimal(16,2))
INSERT into #t1 VALUES
(100, 'CA', '0123', 123.01, null, null),
(100, 'CA', '0124', 0.00, null, null),
(100, 'PA', '0256', 12.10, null, null),
(200, 'MA', '0452', 145.00, null, null),
(300, 'MA', '0578', 134.23, null, null),
(400, 'CA', '1111', 94.12, null, null),
(600, 'CA', '0000', 86.34, null, null),
(500, 'CO', '1111', 0.00, null, null);
update t1
set total_id_price_bystate = sum_price_bystate,total_id_price=sum_price
from #t1 t1
inner join (
select t2_in.Id,
t2_in.astate,
sum(t2_in.price) over(partition by t2_in.id, t2_in.astate) as sum_price_bystate,
sum(t2_in.price) over(partition by t2_in.id) as sum_price
from #t1 t2_in
) t2
on t1.id = t2.id
and t1.astate = t2.astate
select * from #t1

Select duplicate persons with duplicate memberships

SQL Fiddle with schema and my intial attempt.
CREATE TABLE person
([firstname] varchar(10), [surname] varchar(10), [dob] date, [personid] int);
INSERT INTO person
([firstname], [surname], [dob] ,[personid])
VALUES
('Alice', 'AA', '1/1/1990', 1),
('Alice', 'AA', '1/1/1990', 2),
('Bob' , 'BB', '1/1/1990', 3),
('Carol', 'CC', '1/1/1990', 4),
('Alice', 'AA', '1/1/1990', 5),
('Kate' , 'KK', '1/1/1990', 6),
('Kate' , 'KK', '1/1/1990', 7)
;
CREATE TABLE person_membership
([personid] int, [personstatus] varchar(1), [memberid] int);
INSERT INTO person_membership
([personid], [personstatus], [memberid])
VALUES
(1, 'A', 10),
(2, 'A', 20),
(3, 'A', 30),
(3, 'A', 40),
(4, 'A', 50),
(4, 'A', 60),
(5, 'T', 70),
(6, 'A', 80),
(7, 'A', 90);
CREATE TABLE membership
([membershipid] int, [memstatus] varchar(1));
INSERT INTO membership
([membershipid], [memstatus])
VALUES
(10, 'A'),
(20, 'A'),
(30, 'A'),
(40, 'A'),
(50, 'T'),
(60, 'A'),
(70, 'A'),
(80, 'A'),
(90, 'T');
There are three tables (as per the fiddle above). Person table contains duplicates, same people entered more than once, for the purpose of this exercise we assume that a combination of the first name, surname and DoB is enough to uniquely identify a person.
I am trying to build a query which will show duplicates of people (first name+surname+Dob) with two or more active entries in the Person table (person_membership.person_status=A) AND two or more active memberships (membership.mestatus=A).
Using the example from SQL Fiddle, the result of the query should be just Alice (two active person IDs, two active membership IDs).
I think I'm making progress with the following effort but it looks rather cumbersome and I need to remove Katie from the final result - she doesn't have a duplicate membership.
SELECT q.firstname, q.surname, q.dob, p1.personid, m.membershipid
FROM
(SELECT
p.firstname,p.surname,p.dob, count(*) as cnt
FROM
person p
GROUP BY
p.firstname,p.surname,p.dob
HAVING COUNT(1) > 1) as q
INNER JOIN person p1 ON q.firstname=p1.firstname AND q.surname=p1.surname AND q.dob=p1.dob
INNER JOIN person_membership pm ON p1.personid=pm.personid
INNER JOIN membership m ON pm.memberid = m.membershipid
WHERE pm.personstatus = 'A' AND m.memstatus = 'A'
Since you are using SQL Server windows function will be handy for this scenario. The following will give you the expected output.
SELECT firstname,surname,dob,personid,memberid
from(
SELECT firstname,surname,dob,p.personid,memberid
,Rank() over(partition by p.firstname,p.surname,p.dob order by p.personid) rnasc
,Rank() over(partition by p.firstname,p.surname,p.dob order by p.personid desc) rndesc
FROM [StagingGRG].[dbo].[person] p
INNER JOIN person_membership pm ON p.personid=pm.personid
INNER JOIN membership m ON pm.memberid = m.membershipid
where personstatus='A' and memstatus='A')a
where a.rnasc+rndesc>2
You have to add Group by and Having clause to return duplicate items only-
SELECT
person.firstname,person.surname,person.dob
FROM
person, person_membership, membership
WHERE
person.personid=person_membership.personid AND person_membership.memberid = membership.membershipid
AND
person_membership.personstatus = 'A' AND membership.memstatus = 'A'
GROUP BY
person.firstname,person.surname,person.dob
HAVING COUNT(1) > 1

Postgresql: An alternative to subqueries to make the query more efficient?

So I have the following table with the schema:
CREATE TABLE stages (
id serial PRIMARY KEY,
cid VARCHAR(6) NOT NULL,
stage varchar(30) NOT null,
status varchar(30) not null,
);
with the following test data:
INSERT INTO stages (id, cid, stage, status) VALUES
('1', '1', 'first stage', 'accepted'),
('2', '1', 'second stage', 'current'),
('3', '2', 'first stage', 'accepted'),
('4', '3', 'first stage', 'accepted'),
('5', '3', 'second stage', 'accepted'),
('6', '3', 'third stage', 'current')
;
Now the use case is that we want to query this table for each stage for example we will query this table for the 'first stage' and then try to fetch all those cids which do not exist in the subsequent stage for example the 'second stage':
Result Set:
cid | status
2 | 'accepted'
While running the query for the 'second stage', we will try to fetch all those cids that do not exist in the 'third stage' and so on.
Result Set:
cid | status
1 | 'current'
Currently, we do this by making an exists subquery in the where clause which is not very performant.
The question is that is there a better alternative approach to the one we're currently using or do we need to focus on optimizing this current approach only? Also, what further optimizations can we do to make the exists subquery more performant?
Thanks!
You can use lead():
select s.*
from (select s.*,
lead(stage) over (partition by cid order by id) as next_stage
from stages s
) s
where stage = 'first stage' and next_stage is null;
CREATE TABLE stages (
id serial PRIMARY KEY
, cid VARCHAR(6) NOT NULL
, stage varchar(30) NOT null
, status varchar(30) not null
, UNIQUE ( cid, stage)
);
INSERT INTO stages (id, cid, stage, status) VALUES
(1, '1', 'first stage', 'accepted'),
(2, '1', 'second stage', 'current'),
(3, '2', 'first stage', 'accepted'),
(4, '3', 'first stage', 'accepted'),
(5, '3', 'second stage', 'accepted'),
(6, '3', 'third stage', 'current')
;
ANALYZE stages;
-- You can fetch all (three) stages with one query
-- Luckily, {'first', 'second', 'third'} are ordered alphabetically ;-)
-- --------------------------------------------------------------
-- EXPLAIN ANALYZE
SELECT * FROM stages q
WHERE NOT EXISTS (
SELECT * FROM stages x
WHERE x.cid = q.cid AND x.stage > q.stage
);
-- Some people dont like EXISTS, or think that it is slow.
-- --------------------------------------------------------------
-- EXPLAIN ANALYZE
SELECT q.*
FROM stages q
JOIN (
SELECT id
, row_number() OVER (PARTITION BY cid ORDER BY stage DESC) AS rn
FROM stages x
)x ON x.id = q.id AND x.rn = 1;

Remove duplicates by multiple column criteria

I have following table
CREATE TABLE Test (
ID INT NOT NULL IDENTITY(1,1) PRIMARY KEY,
FIRST VARCHAR(10) NOT NULL,
SECOND VARCHAR(10) NOT NULL
)
Table filled with some duplicate data. TestTarget table have same structure and it filled using following procedural algorithm:
DECLARE #first varchar(10), #second varchar(10)
DECLARE c CURSOR FAST_FORWARD
FOR
SELECT first, second FROM Test ORDER BY id
OPEN c
FETCH NEXT FROM c INTO #first, #second
WHILE ##fetch_status = 0
BEGIN
IF NOT EXISTS(SELECT 1 FROM TestTarget WHERE first=#first OR second=#second)
INSERT INTO TestTarget (first, second) VALUES(#first, #second)
FETCH NEXT FROM c INTO #first, #second
END
CLOSE c
DEALLOCATE c
Briefly here we checking target table before insert if it already contains such 'first' OR 'second' value.
Example:
Source table
ID FIRST SECOND
1 A 2
2 A 1
3 A 3
4 B 2
5 B 1
6 B 3
7 B 2
8 B 4
9 C 2
10 C 3
INSERT INTO Test (first, second)
VALUES ('A', '2'),
('A', '1'),
('A', '3'),
('B', '2'),
('B', '1'),
('B', '3'),
('B', '2'),
('B', '4'),
('C', '2'),
('C', '3')
Target table
ID FIRST SECOND
1 A 2
5 B 1
10 C 3
Real source table have x*100k rows and at least 2 rows for same 'first' or 'second' column.
I'm looking for set based solution if it ever possible or please at least something faster than such loop because it takes hours for my real case.
NOTE Classic duplicate removals via partition/join/etc. is not the case here because it will produce different results even with different final number of rows.
INSERT INTO TestTarget (first, second)
SELECT first,second
FROM Test t
WHERE NOT EXISTS
(
SELECT 1
FROM Test t2
WHERE t2.id>t.id and (t2.first=t.first or t2.second=t.second)
)
I cannot think of any simple set based solution to your problem, I am afraid, but I would hope that something along the following lines would be much faster than your existing cursor:
declare #test table
(id int,
first varchar(1),
second varchar(1))
declare #target table
(id int,
first varchar(1),
second varchar(1))
declare #temp table
(id int,
first varchar(1),
second varchar(1))
INSERT INTO #Test (id, first, second)
VALUES (1, 'A', '2'),
(2, 'A', '1'),
(3, 'A', '3'),
(4, 'B', '2'),
(5, 'B', '1'),
(6, 'B', '3'),
(7, 'B', '2'),
(8, 'B', '4'),
(9, 'C', '2'),
(10, 'C', '3')
declare #firsts table
(first varchar(1))
declare #seconds table
(second varchar(1))
INSERT INTO #firsts
SELECT DISTINCT first FROM #test
INSERT INTO #seconds
SELECT DISTINCT second FROM #test
declare #firstcnt int = (SELECT count(*) FROM #firsts)
declare #secondcnt int = (SELECT count(*) FROM #firsts)
WHILE (#firstcnt > 0 AND #secondcnt > 0)
BEGIN
DELETE FROM #temp
INSERT INTO #temp
SELECT TOP 1 t.id, t.first, t.second FROM #test t
INNER JOIN #firsts f On t.first = f.first
INNER JOIN #seconds s On t.second = s.second
ORDER BY id
INSERT INTO #target
SELECT * FROM #temp
DELETE FROM #firsts WHERE first = (SELECT first FROM #temp)
SET #firstcnt = #firstcnt - 1
DELETE FROM #seconds WHERE second = (SELECT second FROM #temp)
SET #secondcnt = #secondcnt - 1
END
SELECT * FROM #target
This does produce the desired values and I would expect it to be faster because the while loop only needs to run for the total number of unique value pairs, rather than having to step through the entire table.
It also gives 10 C 3 as the last row, which I take to be correct, despite #Gordon's comment. If I understand the question correctly, the ID order takes precedence: that is to say, although 'A' and 'B' have entries with '3' as the second value, these entries have a greater id, than another second value that can legitimately be inserted.
HTH
using Recursive CTE,
declare #Target table(col1 varchar(20),col2 int)
declare #Test table(col1 varchar(20),col2 int)
INSERT INTO #Test (col1, col2
VALUES ('A', '2')
('A', '1')
('A', '3'),
('B', '1')
('B', '2'),
('B', '3'),
('B', '2'),
('B', '4'),
('C', '2'),
('C', '3')
 
;With CTE as
(
select col1 ,col2
,DENSE_RANK()over( ORDER by col1)rn1
from #Test
)
,cte1 AS(
select top 1 c.col1,c.col2,rn1 from cte c where rn1=1
union ALL
select c.col1,c.col2,c.rn1 from cte c
inner join cte1 c1
on c.rn1>c1.rn
where c.col2!=c1.col2
)
insert into #Target
select col1,col2 FROM(
select *,ROW_NUMBER()over(partition by col1 order by (select null)) rn2 from cte1
)t4
where rn2=1
select * from #Target

SQL select items between LAG and LEAD using as range

Is it possible to select and sum items from a table using Lag and lead from another table as range as below.
SELECT #Last = MAX(ID) from [dbo].[#Temp]
select opl.Name as [Age Categories] ,
( SELECT count([dbo].udfCalculateAge([BirthDate],GETDATE()))
FROM [dbo].[tblEmployeeDetail] ed
inner join [dbo].[tblEmployee] e
on ed.EmployeeID = e.ID
where convert(int,[dbo].udfCalculateAge(e.[BirthDate],GETDATE()))
between LAG(opl.Name) OVER (ORDER BY opl.id)
and (CASE opl.ID WHEN #Last THEN '100' ELSE opl.Name End )
) as Total
FROM [dbo].[#Temp] opl
tblEmployee contains the employees and their dates of birth
INSERT INTO #tblEmployees VALUES
(1, 'A', 'A1', 'A', '1983/01/02'),
(2, 'B', 'B1', 'BC', '1982/01/02'),
(3, 'C', 'C1', 'JR2', '1982/10/11'),
(4, 'V', 'V1', 'G', '1990/07/12'),
(5, 'VV', 'VV1', 'J', '1992/06/02'),
(6, 'R', 'A', 'D', '1982/05/15'),
(7, 'C', 'Ma', 'C', '1984/09/29')
Next table is a temp table which is created depending on the ages enter by user eg "20;30;50;60" generates a temp table below , using funtion split
select * FROM [dbo].[Split](';','20;30;50;60')
Temp Table
pn s
1 20
2 30
3 50
4 60
Desired output as below, though column Age Categories can be renamed in a data-table in C#. l need the total columns to be accurate on ranges.
Age Categories Total
up to 20 0
21 - 30 2
31 - 50 5
51 - 60 0
Something along these lines should work for you:
declare #tblEmployees table(
ID int,
FirstNames varchar(20),
Surname varchar(20),
Initial varchar(3),
BirthDate date)
INSERT INTO #tblEmployees VALUES
(1, 'A', 'A1', 'A', '1983/01/02'),
(2, 'B', 'B1', 'BC', '1982/01/02'),
(3, 'C', 'C1', 'JR2', '1982/10/11'),
(4, 'V', 'V1', 'G', '1990/07/12'),
(5, 'VV', 'VV1', 'J', '1992/06/02'),
(6, 'R', 'A', 'D', '1982/05/15'),
(7, 'C', 'Ma', 'C', '1984/09/29')
declare #temp table
(id int identity,
age int)
INSERT INTO #temp
SELECT cast(item as int) FROM dbo.fnSplit(';','20;30;50;60')
declare #today date = GetDate()
declare #minBirthCutOff date = (SELECT DATEADD(yy, -MAX(age), #today) FROM #temp)
declare #minBirth date = (SELECT Min(birthdate) from #tblEmployees)
IF #minBirth < #minBirthCutOff
BEGIN
INSERT INTO #temp VALUES (100)
end
SELECT COALESCE(CAST((LAG(t.age) OVER(ORDER BY t.age) + 1) as varchar(3))
+ ' - ','Up to ')
+ CAST(t.age AS varchar(3)) AS [Age Categories],
COUNT(e.id) AS [Total] FROM #temp t
LEFT JOIN
(SELECT te.id,
te.age,
(SELECT MIN(age) FROM #temp t WHERE t.age > te.age) AS agebucket
FROM (select id,
dbo.udfCalculateAge(birthdate,#today) age from #tblEmployees) te) e
ON e.agebucket = t.age
GROUP BY t.age ORDER BY t.age
Result set looks like this:
Age Categories Total
Up to 20 0
21 - 30 2
31 - 50 5
51 - 60 0
For future reference, particularly when asking SQL questions, you will get far faster and better response, if you provide much of the work that I have done. Ie create statements for the tables concerned and insert statements to supply the sample data. It is much easier for you to do this than for us (we have to copy and paste and then re-format etc), whereas you should be able to do the same via a few choice SELECT statements!
Note also that I handled the case when a birthdate falls outside the given range rather differently. It is a bit more efficient to do a single check once via MAX than to complicate your SELECT statement. It also makes it much more readable.
Thanks to HABO for suggestion on GetDate()