Trying to simplify a SQL query without UNION - sql

I'm very bad at explaining, so let me try to lay out my issue. I have a table that resembles the following:
Source Value User
======== ======= ======
old1 1 Phil
new 2 Phil
old2 3 Phil
new 4 Phil
old1 1 Mike
old2 2 Mike
new 1 Jeff
new 2 Jeff
What I need to do is create a query that gets values for users based on the source and the value. It should follow this rule:
For every user, get the highest value. However, disregard the 'new'
source if either 'old1' or 'old2' exists for that user.
So based on those rules, my query should return the following from this table:
Value User
======= ======
3 Phil
2 Mike
2 Jeff
I've come up with a query that does close to what is asked:
SELECT MAX([Value]), [User]
FROM
(
SELECT CASE [Source]
WHEN 'old1' THEN 1
WHEN 'old2' THEN 1
WHEN 'new' THEN 2
END AS [SourcePriority],
[Value],
[User]
FROM #UserValues
) MainPriority
WHERE [SourcePriority] = 1
GROUP BY [User]
UNION
SELECT MAX([Value]), [User]
FROM
(
SELECT CASE [Source]
WHEN 'old1' THEN 1
WHEN 'old2' THEN 1
WHEN 'new' THEN 2
END AS [SourcePriority],
[Value],
[User]
FROM #UserValues
) SecondaryPriority
WHERE [SourcePriority] = 2
GROUP BY [User]
However this returns the following results:
Value User
======= ======
3 Phil
4 Phil
2 Mike
2 Jeff
Obviously that extra value for Phil=4 is not desired. How should I attempt to fix this query? I also understand that this is a pretty convoluted solution and that it can probably be more easily solved by proper use of aggregates, however I'm not too familiar with aggregates yet which resulted in me resorting to a union. Essentially I'm looking for help creating the cleanest-looking solution possible.
Here is the SQL code if anyone wanted to populate the table themselves to give it a try:
CREATE TABLE #UserValues
(
[Source] VARCHAR(10),
[Value] INT,
[User] VARCHAR(10)
)
INSERT INTO #UserValues VALUES
('old1', 1, 'Phil'),
('new', 2, 'Phil'),
('old2', 3, 'Phil'),
('new', 4, 'Phil'),
('old1', 1, 'Mike'),
('old2', 2, 'Mike'),
('new', 1, 'Jeff'),
('new', 2, 'Jeff')

You can solve it fairly easily without resorting to window functions. In this case, you need the maximum value where ((not new) OR (there isn't an old1 or old2 entry)).
Here's a query that works correctly with your sample data:
SELECT
MAX(U1.[Value]) as 'Value'
,U1.[User]
FROM
#UserValues U1
WHERE
U1.[Source] <> 'new'
OR NOT EXISTS (SELECT * FROM #UserValues U2 WHERE U2.[User] = U1.[User] AND U2.[Source] IN ('old1','old2'))
GROUP BY U1.[User]

You can use priorities order by with row_number() :
select top (1) with ties uv.*
from #UserValues uv
order by row_number() over (partition by [user]
order by (case when source = 'old2' then 1 when source = 'old1' then 2 else 3 end), value desc
);
However, if you have only source limited with 3 then you can also do :
. . .
order by row_number() over (partition by [user]
order by (case when source = 'new' then 2 else 1 end), value desc
)

with raw_data
as (
select row_number() over(partition by a.[user] order by a.value desc) as rnk
,count(case when a.source in('old1','old2') then 1 end) over(partition by a.[user]) as cnt_old
,a.*
from uservalues a
)
,curated_data
as(select *
,row_number() over(partition by rd.[user] order by rd.value desc) as rnk2
from raw_data rd
where 0 = case when rnk=1 and source='new' and cnt_old>0 then 1 else 0 end
)
select *
from curated_data
where rnk2=1
I am doing the following
raw_data ->First i get rank the values on the basis of max available value per user. Also i get to check if the user has any records which are pegged at old1 or old2 in the source column
curated_data ->i eliminate records which have the highest value(rnk=1) as new if they have cnt_old >0. Also now i rank(rnk2) the records on the highest value available from this result set.
I select the highest available value from curated_data(ie rnk2=1)

I think you should consider setting up an XREF table to define which source is what priority, for a possible more complicated priorisation in the future. I do it with a temp table:
CREATE TABLE #SourcePriority
(
[Source] VARCHAR(10),
[SourcePriority] INT
)
INSERT INTO #SourcePriority VALUES
('old1', 1),
('old2', 1),
('new', 2)
You might also create a View to look up the SourcePriority to the original table. I do it wit a CTE + possible implementation how to look up the top priority with the highest value:
;WITH CTE as (
SELECT s.[SourcePriority], u.[Value], u.[User]
FROM #UserValues as u
INNER JOIN #SourcePriority as s on u.[Source] = s.[Source]
)
SELECT MAX (v.[Value]) as [Value], v.[User]
FROM (
SELECT MIN ([SourcePriority]) as [TopPriority], [User]
FROM cte
GROUP BY [User]
) as s
INNER JOIN cte as v
ON s.[User] = v.[User] and s.[TopPriority] = v.[SourcePriority]
GROUP BY v.[User]

I think you want:
select top (1) with ties uv.*
from (select uv.*,
sum(case when source in ('old1', 'old2') then 1 else 0 end) over (partition by user) as cnt_old
from #UserValues uv
) uv
where cnt_old = 0 or source <> 'new'
order by row_number() over (partition by user order by value desc);

Related

Count rows in SQL Server table using GROUP BY GroupValue and filter by condition

I think my question is similar to here
How to use last_value with group by with count in SQL Server?, however, I can't seem to transcribe the small change to answer my question.
I have table of colleague contracts
ContractId int PK
ColleagueId int [not null]
ContractStart datetime2 [not null]
ContractEnd datetime2 [null]
BranchId int [not null]
SaturdayOnly bit
isActive bit
What I need to do is get a count, per BranchId of the number of active contracts that are SaturdayOnly i.e. bit 1, and the number of active contracts not SaturdayOnly i.e. bit 0. A colleague can have multiple contracts in the same branch but only one will be active. The final condition is that for the contract to be considered it must start before 2022-12-01 and if there is an end date it must be after 2022-12-01.
I attempted it with this but the 2 cte counts give the same result and the count is incorrect for the branch anyway.
WITH cte AS
(
SELECT
co.BranchId, co.ContractId, co.ColleagueId,
ROW_NUMBER() OVER (PARTITION BY co.ColleagueId ORDER BY co.ContractStart DESC) AS row_number
FROM
hr.Contract co
WHERE
co.SaturdayOnly = 0
AND (co.ContractEnd IS NULL OR co.ContractEnd > '2022-12-01')
),
cte_sat AS
(
SELECT
co.BranchId, co.ContractId, co.ColleagueId,
ROW_NUMBER() OVER (PARTITION BY co.ColleagueId ORDER BY co.ContractStart DESC) AS row_number
FROM
hr.Contract co
WHERE
co.SaturdayOnly = 1
AND (co.ContractEnd IS NULL OR co.ContractEnd > '2022-12-01')
)
SELECT
b.BranchName,
COUNT(cte.ContractId), COUNT(cte_sat.ContractId)
FROM
hr.Branch b
JOIN
cte ON b.ContractorCode = cte.BranchId
JOIN
cte_sat ON b.ContractorCode = cte_sat.BranchId
WHERE
cte.row_number = 1
GROUP BY
b.BranchNumber, b.BranchName
ORDER BY
b.BranchNumber
No need for CTE - try
SELECT BranchId, SaturdayOnly, COUNT(*) FROM hr.Contract
WHERE IsActive = 1 AND ContractStart > ... AND ContractEnd < ...
GROUP BY BranchId, SaturdayOnly

SQL find next unique date and subaccount per account

This query has a few requirements. The basic idea is that for each account, pull the next admit_date and corresponding discharge_date after the subaccount of interest. If there is no next admit_date that is unique, indicate "No Readmit."
I realize pictures are not encouraged on StackOverflow, but I feel a visual aid is helpful. The accounts of interest are AAA, BBB, CCC and DDD and the subaccounts of interest are 121, 214, 315, 414 and 416. Note that CCC has no next unique admit_date (would be "No Readmit"), DDD has two subaccounts of interest with a next unique admit_dates, and that the subaccounts are not necessarily in numerical order (i.e. BBB begins at 221 and ends at 216). So transforming this:
To this:
Here is the setup code:
CREATE TABLE random_table
(
account VarChar(50),
subaccount VarChar(50),
admit_date DATETIME,
discharge_date DATETIME
);
INSERT INTO random_table
VALUES
('AAA',111,6/20/2021,6/25/2021),
('AAA',121,6/20/2021,6/25/2021),
('AAA',131,7/1/2021,7/3/2021),
('AAA',141, 8/2/2021, 8/5/2021),
('BBB',216,4/1/2021,4/3/2021),
('BBB',213,4/1/2021,4/3/2021),
('BBB',221,4/1/2021,4/3/2021),
('BBB',215,4/1/2021,4/3/2021),
('BBB',216,4/5/2021,4/10/2021),
('CCC',313,11/1/2020,11/5/2020),
('CCC',314,11/15/2020,11/17/2020),
('CCC',315,12/23/2020,12/24/2020),
('CCC',316,12/23/2020,12/24/2020),
('DDD',414,7/1/2021,7/3/2021),
('DDD',412,7/6/2021,7/7/2021),
('DDD',416,8/1/2021,8/5/2021),
('DDD',417,8/10/2021,8/15/2021)
To solve for this, I've been trying to use a combination of row_numbers() to mark the first new instance of each admit_date (partitioned by the account), as well as CTEs to select those relevant rows. But obviously not there yet. Any suggestions? Here's what I have:
select
cte2.*
,case when cte2.subaccount in (111,121,131,141,216,213,221,215,216,313,314,315,316,414,412,416,417
) then lead(cte2.admit_date) over (order by cte2.account, cte2.row_nums)
else null
end second_admit
from (
select
cte.*
,row_number() over (partition by cte.account order by cte.row_num) row_nums
from (
select distinct
hsp.subaccount
,row_number() over (partition by pat.account, hsp.admit_date order by pat.account) row_num
,case when row_number() over (partition by pat.account,hsp.admit_date order by pat.account) =1 then 'New Admit' else null end new_admit
,convert(varchar,hsp.admit_date,101) adm_date
,convert(varchar,hsp.discharge_date,101) disch_date
,pat.account
from hsp_account hsp
left join patient pat on hsp.pat_id=pat.pat_id
where pat.account in ('AAA','BBB','CCC','DDD')
) cte
where cte.new_admit = 'New Admit'
) cte2
Hope this is what you're looking for:
with AI as
(
select * from
(values ('AAA'), ('BBB'), ('CCC'), ('DDD')) A(account)
),
SI as
(
select * from
(values (121), (214), (221), (315), (414), (416)) A(subaccount)
),
T as
(
select * from random_table
where account in (select * from AI)
),
N as
(
select
T1.account,
T1.subaccount,
T1.admit_date,
T1.discharge_date,
T2.subaccount next_subaccount,
T2.admit_date next_admit_date,
T2.discharge_date next_discharge_date,
row_number()
over(
partition by T1.account, T1.subaccount
order by T2.admit_date) group_id
from
T T1
left join
T T2
on
T1.account = T2.account and
T1.admit_date < T2.admit_date
where
T1.subaccount in (select * from SI)
)
select
account, subaccount, admit_date,
next_subaccount, next_admit_date, next_discharge_date
from N
where
N.group_id = 1
Please note that I print NULL instead of 'No Readmit'.

SQL - select last and previous different to last

The problem: a simplified membership table containing membership id, starting date for each membership and membership level description:
CREATE TABLE cover
(
[membership_id] int,
[cover_from_date] date,
[description] varchar(57)
);
INSERT INTO cover ([membership_id], [cover_from_date], [description])
VALUES (1, '1/1/2011', 'AA'),
(1, '1/2/2011', 'BB'),
(1, '1/3/2011', 'CC'),
(1, '1/4/2011', 'CC');
The task: to list the current membership and the immediate previous membership different to the current one. So from the above table I would like to see something like:
1, 1/4/2011, CC, 1/2/2011, BB
The attempted solution: I have managed to come up with a solution but it takes an enormous time to run on a large database and I'm sure there are better ways of resolving this problem. My no-doubt over complicated query is as follows:
with cte as
(
select
cover.membership_id, cover.cover_from_date,
cover.description,
row_number() over (partition by cover.membership_id order by cover.cover_from_date desc) AS version_no
from
cover
)
select
cte.membership_id,
cover_now.cover_from_date, cover_now.description,
cover_prev.cover_from_date, cover_prev.description
from
cte
left outer join
cte cover_now on cte.membership_id = cover_now.membership_id
and cover_now.version_no = 1
left outer join
cte cover_prev on cte.membership_id = cover_prev.membership_id
and cover_prev.version_no = (select min(x.version_no)
from cte x
where x.version_no >= 2
and x.membership_id = cover_now.membership_id
and x.description <> cover_now.description)
group by
cte.membership_id, cover_now.cover_from_date, cover_now.description,
cover_prev.cover_from_date, cover_prev.description
The entire fiddle is located here. Any tips on how to optimise the query would be appreciated.
First create an index on membership_id and cover_from_date in descending order. It will be heavily used by this query.
create index cover_by_date on cover (membership_id asc, cover_from_date desc)
Then:
select
membership.membership_id,
membership.cover_from_date,
membership.description,
previous_membership.cover_from_date,
previous_membership.description
from
(
select membership_id, description, cover_from_date, row_number() over (partition by membership_id order by cover_from_date desc) as rank
from cover
) as membership
left join (
select previous.membership_id, previous.description, previous.cover_from_date, row_number() over (partition by previous.membership_id order by previous.cover_from_date desc) as rank
from cover
join cover as previous on
cover.membership_id = previous.membership_id and
cover.description <> previous.description and
cover.cover_from_date > previous.cover_from_date
) as previous_membership on
previous_membership.membership_id = membership.membership_id and
previous_membership.rank = 1
where
membership.rank = 1

SQL Query Help - Negative reporting

Perhaps somebody can help with Ideas or a Solution. A User asked me for a negative report. We have a table with tickets each ticket has a ticket number which would be easy to select but the user wants a list of missing tickets between the first and last ticket in the system.
E.g. Select TicketNr from Ticket order by TicketNr
Result
1,
2,
4,
7,
11
But we actually want the result 3,5,6,8,9,10
CREATE TABLE [dbo].[Ticket](
[pknTicketId] [int] IDENTITY(1,1) NOT NULL,
[TicketNr] [int] NULL
) ON [PRIMARY]
GO
SQL Server 2016 - TSQL
Any ideas ?
So a bit more information is need all solution thus far works on small table. Our production database has over 4 million tickets. Hence why we need to find the missing ones.
First get the minimum and maximum, then generate all posible ticket numbers and finally select the ones that are missing.
;WITH FirstAndLast AS
(
SELECT
MinTicketNr = MIN(T.TicketNr),
MaxTicketNr = MAX(T.TicketNr)
FROM
Ticket AS T
),
AllTickets AS
(
SELECT
TicketNr = MinTicketNr,
MaxTicketNr = T.MaxTicketNr
FROM
FirstAndLast AS T
UNION ALL
SELECT
TicketNr = A.TicketNr + 1,
MaxTicketNr = A.MaxTicketNr
FROM
AllTickets AS A
WHERE
A.TicketNr + 1 <= A.MaxTicketNr
)
SELECT
A.TicketNr
FROM
AllTickets AS A
WHERE
NOT EXISTS (
SELECT
'missing ticket'
FROM
Ticket AS T
WHERE
A.TicketNr = T.TicketNr)
ORDER BY
A.TicketNr
OPTION
(MAXRECURSION 32000)
If you can accept the results in a different format, the following will do what you want:
select TicketNr + 1 as first_missing,
next_TicketNr - 1 as last_missing,
(next_TicketNr - TicketNr - 1) as num_missing
from (select t.*, lead(TicketNr) over (order by TicketNr) as next_TicketNr
from Ticket t
) t
where next_TicketNr <> TicketNr + 1;
This shows each sequence of missing ticket numbers on a single row, rather than a separate row for each of them.
If you do use a recursive CTE, I would recommend doing it only for the missing tickets:
with cte as (
select (TicketNr + 1) as missing_TicketNr
from (select t.*, lead(TicketNr) over (order by TicketNr) as next_ticketNr
from tickets t
) t
where next_TicketNr <> TicketNr + 1
union all
select missing_TicketNr + 1
from cte
where not exists (select 1 from tickets t2 where t2.TicketNr = cte.missing_TicketNr + 1)
)
select *
from cte;
This version starts with the list of missing ticket numbers. It then adds a new one, as the numbers are not found.
One method is to use recursive cte to find the missing ticket numbers :
with missing as (
select min(TicketNr) as mnt, max(TicketNr) as mxt
from ticket t
union all
select mnt+1, mxt
from missing m
where mnt < mxt
)
select m.*
from missing m
where not exists (select 1 from tickets t where t.TicketNr = m.mnt);
This should do the trick: SQL Fiddle
declare #ticketsTable table (ticketNo int not null)
insert #ticketsTable (ticketNo) values (1),(2),(4),(7),(11)
;with cte1(ticketNo, isMissing, sequenceNo) AS
(
select ticketNo
, 0
, row_number() over (order by ticketNo)
from #ticketsTable
)
, cte2(ticketNo, isMissing, sequenceNo) AS
(
select ticketNo, isMissing, sequenceNo
from cte1
union all
select a.ticketNo + 1
, 1
, a.sequenceNo
from cte2 a
inner join cte1 b
on b.sequenceNo = a.sequenceNo + 1
and b.ticketNo != a.ticketNo + 1
)
select *
from cte2
where isMissing = 1
order by ticketNo
It works by collecting all of the existing tickets, marking them as existing, and assigning each a consecutive number giving their order in the original list.
We can then see the gaps in the list by finding any spots where the consecutive order number shows the next record, but the ticket numbers are not consecutive.
Finally, we recursively fill in the gaps; working from the start of a gap and adding new records until that gap's consecutive numbers no longer has a gap between the related ticket numbers.
I think this one give you easiest solution
with cte as(
select max(TicketNr) maxnum,min(TicketNr) minnum from Ticket )
select a.number FROM master..spt_values a,cte
WHERE Type = 'P' and number < cte.maxnum and number > cte.minno
except
select TicketNr FROM Ticket
So After looking at all the solutions
I went with creating a temp table with a full range of number from Starting to Ending ticket and then select from the Temp table where the ticket number not in the ticket table.
The reason being I kept running in MAXRECURSION problems.

How do I select the most recent entity of its type in a related entity list in T-SQL or Linq to Sql?

I have a table full of Actions. Each Action is done by a certain User at a certain DateTime. So it has 4 fields: Id, UserId, ActionId, and ActionDate.
At first, I was just reporting the top 10 most recent Actions like this:
(from a in db.Action
orderby a.ActionDate descending
select a).Take(10);
That is simple and it works. But the report is less useful than I thought. This is because some user might take 10 actions in a row and hog the whole top 10 list. So I would like to report the single most recent action taken for each of the top 10 most recently active users.
From another question on SO, I have gotten myself most of the way there. It looks like I need the "group" feature. If I do this:
from a in db.Action
orderby a.ActionDate descending
group a by a.UserId into g
select g;
And run it in linqpad, I get an IOrderedQueryable<IGrouping<Int32,Action>> result set with one group for each user. However, it is showing ALL the actions taken by each user and the result set is hierarchical and I would like it to be flat.
So if my Action table looks like this
Id UserId ActionId ActionDate
1 1 1 2010/01/09
2 1 63 2010/01/10
3 2 1 2010/01/03
4 2 7 2010/01/06
5 3 11 2010/01/07
I want the query to return records 2, 5, and 4, in that order. This shows me, for each user, the most recent action taken by that user, and all reported actions are in order, with the most recent at the top. So I would like to see:
Id UserId ActionId ActionDate
2 1 63 2010/01/10
5 3 11 2010/01/07
4 2 7 2010/01/06
EDIT:
I am having a hard time expressing this in T-SQL, as well. This query gets me the users and their last action date:
select
a.UserId,
max(a.ActionDate) as LastAction
from
Action as a
group by
a.UserId
order by
LastAction desc
But how do I access the other information that is attached to the record where the max ActionDate was found?
EDIT2: I have been refactoring and Action is now called Read, but everything else is the same. I have adopted Frank's solution and it is as follows:
(from u in db.User
join r in db.Read on u.Id equals r.UserId into allRead
where allRead.Count() > 0
let lastRead = allRead.OrderByDescending(r => r.ReadDate).First()
orderby lastRead.ReadDate descending
select new ReadSummary
{
Id = u.Id,
UserId = u.Id,
UserNameFirstLast = u.NameFirstLast,
ProductId = lastRead.ProductId,
ProductName = lastRead.Product.Name,
SegmentCode = lastRead.SegmentCode,
SectionCode = lastRead.SectionCode,
ReadDate = lastRead.ReadDate
}).Take(10);
This turns into the following:
exec sp_executesql N'SELECT TOP (10) [t12].[Id], [t12].[ExternalId], [t12].[FirstName], [t12].[LastName], [t12].[Email], [t12].[DateCreated], [t12].[DateLastModified], [t12].[DateLastLogin], [t12].[value] AS [ProductId], [t12].[value2] AS [ProductName], [t12].[value3] AS [SegmentCode], [t12].[value4] AS [SectionCode], [t12].[value5] AS [ReadDate2]
FROM (
SELECT [t0].[Id], [t0].[ExternalId], [t0].[FirstName], [t0].[LastName], [t0].[Email], [t0].[DateCreated], [t0].[DateLastModified], [t0].[DateLastLogin], (
SELECT [t2].[ProductId]
FROM (
SELECT TOP (1) [t1].[ProductId]
FROM [dbo].[Read] AS [t1]
WHERE [t0].[Id] = [t1].[UserId]
ORDER BY [t1].[ReadDate] DESC
) AS [t2]
) AS [value], (
SELECT [t5].[Name]
FROM (
SELECT TOP (1) [t3].[ProductId]
FROM [dbo].[Read] AS [t3]
WHERE [t0].[Id] = [t3].[UserId]
ORDER BY [t3].[ReadDate] DESC
) AS [t4]
INNER JOIN [dbo].[Product] AS [t5] ON [t5].[Id] = [t4].[ProductId]
) AS [value2], (
SELECT [t7].[SegmentCode]
FROM (
SELECT TOP (1) [t6].[SegmentCode]
FROM [dbo].[Read] AS [t6]
WHERE [t0].[Id] = [t6].[UserId]
ORDER BY [t6].[ReadDate] DESC
) AS [t7]
) AS [value3], (
SELECT [t9].[SectionCode]
FROM (
SELECT TOP (1) [t8].[SectionCode]
FROM [dbo].[Read] AS [t8]
WHERE [t0].[Id] = [t8].[UserId]
ORDER BY [t8].[ReadDate] DESC
) AS [t9]
) AS [value4], (
SELECT [t11].[ReadDate]
FROM (
SELECT TOP (1) [t10].[ReadDate]
FROM [dbo].[Read] AS [t10]
WHERE [t0].[Id] = [t10].[UserId]
ORDER BY [t10].[ReadDate] DESC
) AS [t11]
) AS [value5]
FROM [dbo].[User] AS [t0]
) AS [t12]
WHERE ((
SELECT COUNT(*)
FROM [dbo].[Read] AS [t13]
WHERE [t12].[Id] = [t13].[UserId]
)) > #p0
ORDER BY (
SELECT [t15].[ReadDate]
FROM (
SELECT TOP (1) [t14].[ReadDate]
FROM [dbo].[Read] AS [t14]
WHERE [t12].[Id] = [t14].[UserId]
ORDER BY [t14].[ReadDate] DESC
) AS [t15]
) DESC',N'#p0 int',#p0=0
If anyone knows something simpler (for the sport of it) I would like to know, but I think this is probably good enough.
Probably have some errors in this, but I think you want to do a join into a collection, then use 'let' to choose a member of that collection:
(
from u in db.Users
join a in db.Actions on u.UserID equals a.UserID into allActions
where allActions.Count() > 0
let firstAction = allActions.OrderByDescending(a => a.ActionDate).First()
orderby firstAction.ActionDate descending
select (u,firstAction)
).Take(10)