TSQL - Delete All Rows Except 1 Per Group - sql

Let's say I have 5 workcenters (Workcenter 1, Workcenter 2, Workcenter 3, Workcenter 4, Workcenter 5)
Each workcenter has several rows of notes that are ordered by the date the data was entered. I would like to delete all rows per workcenter except the row of data that was entered last.
If my columns are: ID | Workcenter | Note | Log_Date
How would I go about doing this?
My code is only giving me the most current note entry for the entire table, but I want one per workcenter.
This is what I have right now:
DELETE FROM #Table
WHERE ID NOT IN (SELECT TOP 1 ID FROM #Table
GROUP BY Workcenter, ID
ORDER BY Log_Date DESC)

try this:
delete t1 from table t1
where not exists
(select 1 from
(select workcenter,max(log_date)as log_date from table group by workcenter) t2
where t1.workcenter = t2.workcenter and t1.log_date = t2.log_date
)
use exists subquery to get the max log_date for each workcenter and then connect them to the table.

using CTE we can achieve this:
;WITH cte AS
(SELECT ROW_NUMBER() OVER (PARTITION BY name ORDER BY createdate DESC ) AS rowno, * FROM workgroups)
DELETE FROM cte WHERE rowno !=1;
CREATE TABLE workgroups(id INT IDENTITY(1,1),name VARCHAR(50), createdate DATETIME DEFAULT GETDATE())
INSERT [dbo].[workgroups] ([id], [name], [createdate]) VALUES (1, N'workgroup1', CAST(0x0000A60F011F7840 AS DateTime))
INSERT [dbo].[workgroups] ([id], [name], [createdate]) VALUES (2, N'workgroup1', CAST(0x0000A60F011F7F8E AS DateTime))
INSERT [dbo].[workgroups] ([id], [name], [createdate]) VALUES (3, N'workgroup1', CAST(0x0000A60F011F8728 AS DateTime))
INSERT [dbo].[workgroups] ([id], [name], [createdate]) VALUES (4, N'workgroup2', CAST(0x0000A60F011F92B9 AS DateTime))
INSERT [dbo].[workgroups] ([id], [name], [createdate]) VALUES (5, N'workgroup2', CAST(0x0000A60F011F97C0 AS DateTime))
INSERT [dbo].[workgroups] ([id], [name], [createdate]) VALUES (6, N'workgroup3', CAST(0x0000A60F011FA443 AS DateTime))
INSERT [dbo].[workgroups] ([id], [name], [createdate]) VALUES (7, N'workgroup3', CAST(0x0000A60F011FA73B AS DateTime))
INSERT [dbo].[workgroups] ([id], [name], [createdate]) VALUES (8, N'workgroup3', CAST(0x0000A60F011FA9FB AS DateTime))
SELECT ROW_NUMBER() OVER (PARTITION BY name ORDER BY createdate DESC ) AS rowno, * FROM workgroups
;WITH cte AS
(SELECT ROW_NUMBER() OVER (PARTITION BY name ORDER BY createdate DESC ) AS rowno, * FROM workgroups)
DELETE FROM cte WHERE rowno !=1;

Related

Query to summarize rows from table with version history

I'm looking for help with a SQL query. Below are the details.
Database: Microsoft SQL Server 2016
Data Table:
It's a "version history" table with 3 columns: version number, effective date, and end date.
The version number with an end_dt of 12/31/9999 is considered the "active" version number.
Users can "restore" prior versions and make them active again.
version_number
eff_dt
end_dt
0
2021-04-13 18:03:26.483
2021-04-16 18:35:06.367
1
2021-04-16 18:35:06.370
2021-04-19 20:45:38.993
1
2021-04-19 20:45:38.997
2021-05-06 16:00:59.990
2
2021-05-06 16:00:59.990
2021-05-06 16:13:03.997
3
2021-05-06 16:13:04.000
2021-05-06 16:17:23.127
4
2021-05-06 16:17:23.130
2021-05-06 16:52:45.250
4
2021-05-06 16:52:45.253
2021-05-11 15:36:25.283
4
2021-05-11 15:36:25.283
2021-05-14 15:52:50.843
5
2021-05-14 15:52:50.847
2021-05-20 17:14:55.860
4
2021-05-20 17:14:55.863
2021-05-20 17:14:55.867
1
2021-05-20 17:14:55.870
9999-12-31 00:00:00.000
Desired Output:
A query to display a consolidated version history where consecutive entries in the version history table are displayed as a single row encompassing the entire date range the version was active.
version_number
eff_dt
end_dt
0
2021-04-13 18:03:26.483
2021-04-16 18:35:06.367
1
2021-04-16 18:35:06.370
2021-05-06 16:00:59.990
2
2021-05-06 16:00:59.990
2021-05-06 16:13:03.997
3
2021-05-06 16:13:04.000
2021-05-06 16:17:23.127
4
2021-05-06 16:17:23.130
2021-05-14 15:52:50.843
5
2021-05-14 15:52:50.847
2021-05-20 17:14:55.860
4
2021-05-20 17:14:55.863
2021-05-20 17:14:55.867
1
2021-05-20 17:14:55.870
9999-12-31 00:00:00.000
Question:
How would one write a SQL statement to generate the Desired Output based on the Data Table?
SQL Script to create sample data:
CREATE TABLE #t1(
[version_number] [int] NULL,
[eff_dt] [datetime] NOT NULL,
[end_dt] [datetime] NOT NULL
)
GO
INSERT #t1 ([version_number], [eff_dt], [end_dt]) VALUES (1, CAST(N'2021-05-20T17:14:55.870' AS DateTime), CAST(N'9999-12-31T00:00:00.000' AS DateTime))
GO
INSERT #t1 ([version_number], [eff_dt], [end_dt]) VALUES (5, CAST(N'2021-05-14T15:52:50.847' AS DateTime), CAST(N'2021-05-20T17:14:55.860' AS DateTime))
GO
INSERT #t1 ([version_number], [eff_dt], [end_dt]) VALUES (4, CAST(N'2021-05-20T17:14:55.863' AS DateTime), CAST(N'2021-05-20T17:14:55.867' AS DateTime))
GO
INSERT #t1 ([version_number], [eff_dt], [end_dt]) VALUES (4, CAST(N'2021-05-11T15:36:25.283' AS DateTime), CAST(N'2021-05-14T15:52:50.843' AS DateTime))
GO
INSERT #t1 ([version_number], [eff_dt], [end_dt]) VALUES (4, CAST(N'2021-05-06T16:52:45.253' AS DateTime), CAST(N'2021-05-11T15:36:25.283' AS DateTime))
GO
INSERT #t1 ([version_number], [eff_dt], [end_dt]) VALUES (4, CAST(N'2021-05-06T16:17:23.130' AS DateTime), CAST(N'2021-05-06T16:52:45.250' AS DateTime))
GO
INSERT #t1 ([version_number], [eff_dt], [end_dt]) VALUES (3, CAST(N'2021-05-06T16:13:04.000' AS DateTime), CAST(N'2021-05-06T16:17:23.127' AS DateTime))
GO
INSERT #t1 ([version_number], [eff_dt], [end_dt]) VALUES (2, CAST(N'2021-05-06T16:00:59.990' AS DateTime), CAST(N'2021-05-06T16:13:03.997' AS DateTime))
GO
INSERT #t1 ([version_number], [eff_dt], [end_dt]) VALUES (1, CAST(N'2021-04-19T20:45:38.997' AS DateTime), CAST(N'2021-05-06T16:00:59.990' AS DateTime))
GO
INSERT #t1 ([version_number], [eff_dt], [end_dt]) VALUES (1, CAST(N'2021-04-16T18:35:06.370' AS DateTime), CAST(N'2021-04-19T20:45:38.993' AS DateTime))
GO
INSERT #t1 ([version_number], [eff_dt], [end_dt]) VALUES (0, CAST(N'2021-04-13T18:03:26.483' AS DateTime), CAST(N'2021-04-16T18:35:06.367' AS DateTime))
GO
You can achieve this in two steps, using successive common table expressions (cte). Firstly, you need a consecutive ranking number within your data. On the basis of this you can then do a recursive cte, looking at the version_number of consecutive rows (necessarily one apart). This allows us to create a "batch" number: if the version_number is the same, then we take the previous batch number, if it is different, we increment the previous batch number by one. Finally we need a simple min and max on the dates grouping by the batch number. The result looks like this:
declare #t1 TABLE (
[version_number] [int] NULL,
[eff_dt] [datetime] NOT NULL,
[end_dt] [datetime] NOT NULL
);
INSERT #t1 ([version_number], [eff_dt], [end_dt])
VALUES
(1, CAST(N'2021-05-20T17:14:55.870' AS DateTime), CAST(N'9999-12-31T00:00:00.000' AS DateTime)),
(5, CAST(N'2021-05-14T15:52:50.847' AS DateTime), CAST(N'2021-05-20T17:14:55.860' AS DateTime)),
(4, CAST(N'2021-05-20T17:14:55.863' AS DateTime), CAST(N'2021-05-20T17:14:55.867' AS DateTime)),
(4, CAST(N'2021-05-11T15:36:25.283' AS DateTime), CAST(N'2021-05-14T15:52:50.843' AS DateTime)),
(4, CAST(N'2021-05-06T16:52:45.253' AS DateTime), CAST(N'2021-05-11T15:36:25.283' AS DateTime)),
(4, CAST(N'2021-05-06T16:17:23.130' AS DateTime), CAST(N'2021-05-06T16:52:45.250' AS DateTime)),
(3, CAST(N'2021-05-06T16:13:04.000' AS DateTime), CAST(N'2021-05-06T16:17:23.127' AS DateTime)),
(2, CAST(N'2021-05-06T16:00:59.990' AS DateTime), CAST(N'2021-05-06T16:13:03.997' AS DateTime)),
(1, CAST(N'2021-04-19T20:45:38.997' AS DateTime), CAST(N'2021-05-06T16:00:59.990' AS DateTime)),
(1, CAST(N'2021-04-16T18:35:06.370' AS DateTime), CAST(N'2021-04-19T20:45:38.993' AS DateTime)),
(0, CAST(N'2021-04-13T18:03:26.483' AS DateTime), CAST(N'2021-04-16T18:35:06.367' AS DateTime));
with rowdata as
(
SELECT version_number, eff_dt, end_dt,
ROW_NUMBER() OVER(ORDER BY eff_dt) rn
FROM #t1
),
cte_recursive as
(
SELECT 1 as batchno, rn, version_number, eff_dt, end_dt
FROM rowdata
WHERE version_number = 0
UNION ALL
SELECT CASE WHEN rec.version_number = rd.version_number
THEN rec.batchno
ELSE rec.batchno + 1
END,
rd.rn, rd.version_number, rd.eff_dt, rd.end_dt
FROM cte_recursive rec
INNER JOIN rowdata rd on rec.rn = rd.rn - 1
)
SELECT
version_number, min(eff_dt) as eff_dt, max(end_dt) as end_dt
FROM cte_recursive
GROUP BY version_number, batchno
A couple of points to note. I prefer to use table variables to temporary tables (has a slight advantage that they don't need to be deleted!). Secondly you can insert multiple values separated by commas, as I have shown (no need for multiple inserts).
To help you understand how the recursive element works, we begin by a simple select which is the base case, in this case selecting where version_number is 0. We then build up from that by joining to the recursive part where rn (the value returned by ROW_NUMBER()) is one greater than the value we already have. We simply need to check for a difference in the version_number between our old value and the new row, to decide if the batch number needs incrementing or not.
You may find it helpful to run these queries one at a time, to help you understand what is happening (for example just run the sub-select that includes the row_number()).
BTW it was good of you to add the create statements.
This is easily accomplished using window functions. It's a variation on the gaps and islands problem.
The premise is to identify the islands of consecutive values of the version_number. The first CTE uses lag to compare the current row value to the previous row value and marks the start of the Next Group when the values are different. The second CTE uses sum as a window function to produce a running total of the groups. This provides each group of like version_numbers with its own sequential value.
The final select is then able to group by the version_number and its sequential group number, using the min and max dates for each.
Note also that using windows function and hitting the source table just once will also be significantly more efficient than a recursive solution.
with ng as (
select *, case when Lag(version_number) over(order by end_dt) = version_number then 0 else 1 end as ng
from #t1
), grp as (
select *, Sum(ng) over(order by end_dt) as grp
from ng
)
select version_number, Min(eff_dt) eff_dt, Max(end_dt) end_dt
from grp
group by version_number, grp
order by eff_dt
you can use this Query :
DROP TABLE #t1
CREATE TABLE #t1(
[version_number] [int] NULL,
[eff_dt] [datetime] NOT NULL,
[end_dt] [datetime] NOT NULL
)
GO
INSERT #t1 ([version_number], [eff_dt], [end_dt]) VALUES (1, CAST(N'2021-05-20T17:14:55.870' AS DateTime), CAST(N'9999-12-31T00:00:00.000' AS DateTime))
GO
INSERT #t1 ([version_number], [eff_dt], [end_dt]) VALUES (5, CAST(N'2021-05-14T15:52:50.847' AS DateTime), CAST(N'2021-05-20T17:14:55.860' AS DateTime))
GO
INSERT #t1 ([version_number], [eff_dt], [end_dt]) VALUES (4, CAST(N'2021-05-20T17:14:55.863' AS DateTime), CAST(N'2021-05-20T17:14:55.867' AS DateTime))
GO
INSERT #t1 ([version_number], [eff_dt], [end_dt]) VALUES (4, CAST(N'2021-05-11T15:36:25.283' AS DateTime), CAST(N'2021-05-14T15:52:50.843' AS DateTime))
GO
INSERT #t1 ([version_number], [eff_dt], [end_dt]) VALUES (4, CAST(N'2021-05-06T16:52:45.253' AS DateTime), CAST(N'2021-05-11T15:36:25.283' AS DateTime))
GO
INSERT #t1 ([version_number], [eff_dt], [end_dt]) VALUES (4, CAST(N'2021-05-06T16:17:23.130' AS DateTime), CAST(N'2021-05-06T16:52:45.250' AS DateTime))
GO
INSERT #t1 ([version_number], [eff_dt], [end_dt]) VALUES (3, CAST(N'2021-05-06T16:13:04.000' AS DateTime), CAST(N'2021-05-06T16:17:23.127' AS DateTime))
GO
INSERT #t1 ([version_number], [eff_dt], [end_dt]) VALUES (2, CAST(N'2021-05-06T16:00:59.990' AS DateTime), CAST(N'2021-05-06T16:13:03.997' AS DateTime))
GO
INSERT #t1 ([version_number], [eff_dt], [end_dt]) VALUES (1, CAST(N'2021-04-19T20:45:38.997' AS DateTime), CAST(N'2021-05-06T16:00:59.990' AS DateTime))
GO
INSERT #t1 ([version_number], [eff_dt], [end_dt]) VALUES (1, CAST(N'2021-04-16T18:35:06.370' AS DateTime), CAST(N'2021-04-19T20:45:38.993' AS DateTime))
GO
INSERT #t1 ([version_number], [eff_dt], [end_dt]) VALUES (0, CAST(N'2021-04-13T18:03:26.483' AS DateTime), CAST(N'2021-04-16T18:35:06.367' AS DateTime))
GO
SELECT DISTINCT t.[version_number] , eff_Table.eff_dt , end_Table.end_dt FROM #t1 t INNER JOIN
(SELECT t.version_number, t.eff_dt
,ROW_NUMBER() OVER (PARTITION BY t.version_number ORDER BY t.eff_dt) AS FirstID
FROM #t1 t ) AS eff_Table ON t.version_number = eff_Table.version_number
INNER JOIN (
SELECT t.version_number,
t.end_dt
,ROW_NUMBER() OVER (PARTITION BY t.version_number ORDER BY t.end_dt desc) AS SecondID
FROM #t1 t ) AS end_Table ON end_Table.version_number = t.version_number
WHERE eff_Table.FirstID = 1 AND end_Table.SecondID = 1

How Result using cross apply, can achieve using any other joins like inner,left,right join in sql server

This is my SQL script with sample data
CREATE TABLE [dbo].[Employee]
(
[ID] [INT] IDENTITY(1,1) NOT NULL,
[Name] [VARCHAR](100) NULL
) ON [PRIMARY]
GO
CREATE TABLE [dbo].[LoginEntry]
(
[ID] [INT] IDENTITY(1,1) NOT NULL,
[LoginTime] [DATETIME] NULL,
[EmpID] [INT] NULL,
[GateNumber] [VARCHAR](50) NULL
) ON [PRIMARY]
GO
ALTER TABLE Employee
ADD CONSTRAINT Pk_Employee PRIMARY KEY (Id)
GO
ALTER TABLE LoginEntry
ADD CONSTRAINT Fk_LoginEntry_Employee
FOREIGN KEY (EmpId) REFERENCES Employee(Id)
GO
SET IDENTITY_INSERT [dbo].[Employee] ON
GO
INSERT [dbo].[Employee] ([ID], [Name])
VALUES (1, N'Employee 1'), (2, N'Employee 2'), (3, N'Employee 3'),
(4, N'Employee 4'), (5, N'Employee 5'), (6, N'Employee 6')
GO
SET IDENTITY_INSERT [dbo].[Employee] OFF
GO
SET IDENTITY_INSERT [dbo].[LoginEntry] ON
GO
INSERT [dbo].[LoginEntry] ([ID], [LoginTime], [EmpID], [GateNumber])
VALUES (1, CAST(N'2014-10-24 08:00:00.000' AS DateTime), 1, N'Gate 1'),
(2, CAST(N'2014-10-24 09:00:00.000' AS DateTime), 1, N'Gate 1'),
(3, CAST(N'2014-10-24 10:00:00.000' AS DateTime), 1, N'Gate 2'),
(4, CAST(N'2014-10-24 08:00:00.000' AS DateTime), 2, N'Gate 1'),
(5, CAST(N'2014-10-24 09:00:00.000' AS DateTime), 2, N'Gate 1'),
(6, CAST(N'2014-10-24 10:00:00.000' AS DateTime), 2, N'Gate 2'),
(7, CAST(N'2014-10-24 08:00:00.000' AS DateTime), 3, N'Gate 1'),
(8, CAST(N'2014-10-24 09:00:00.000' AS DateTime), 3, N'Gate 1'),
(9, CAST(N'2014-10-24 10:00:00.000' AS DateTime), 3, N'Gate 2'),
(10, CAST(N'2014-10-24 08:00:00.000' AS DateTime), 4, N'Gate 1'),
(11, CAST(N'2014-10-24 09:00:00.000' AS DateTime), 4, N'Gate 1'),
(19, CAST(N'2014-10-24 08:00:00.000' AS DateTime), 5, N'Gate 1'),
(20, CAST(N'2014-10-24 09:00:00.000' AS DateTime), 5, N'Gate 1'),
(21, CAST(N'2014-10-24 10:00:00.000' AS DateTime), 5, N'Gate 2'),
(22, CAST(N'2014-10-24 08:00:00.000' AS DateTime), 6, N'Gate 1'),
(23, CAST(N'2014-10-24 09:00:00.000' AS DateTime), 6, N'Gate 1'),
(24, CAST(N'2014-10-24 10:00:00.000' AS DateTime), 6, N'Gate 2')
SET IDENTITY_INSERT [dbo].[LoginEntry] OFF
GO
SELECT
e.ID, dt.EmpId, Name, LoginTime
FROM
Employee e
CROSS APPLY
(SELECT TOP 1
l.ID, l.LoginTime, l.EmpId
FROM
LoginEntry l
WHERE
l.EmpId = e.id) dt
GO
The result I get:
ID EmpId Name LoginTime
-----------------------------------------------
1 1 Employee 1 2014-10-24 08:00:00.000
2 2 Employee 2 2014-10-24 08:00:00.000
3 3 Employee 3 2014-10-24 08:00:00.000
4 4 Employee 4 2014-10-24 08:00:00.000
5 5 Employee 5 2014-10-24 08:00:00.000
6 6 Employee 6 2014-10-24 08:00:00.000
I am Expecting the same result using Joins(inner,right,left,full) in sql server i tried my luck but couldn't, pls can any one help me out thanks in advance
First, your query is incomplete. When you use TOP 1 without an ORDER BY you never have a guarantee which one it will pick. New data, concurrent processes, re-indexing, software patches, the time of day, they all can cause the result to change.
So, it should be something like...
SELECT
e.ID,dt.EmpId,Name,LoginTime
FROM
Employee e
CROSS APPLY
(
SELECT TOP 1
l.ID
,l.LoginTime
,l.EmpId
FROM
LoginEntry l
WHERE
l.EmpId=e.id
ORDER BY
l.LoginTime DESC -- Will cause TOP 1 to pick the most recent value (per employee)
)
dt
As for doing it with joins, doing the TOP 1 (or greatest-n-per-group, for which your n is 1), is longer, messier, and slower. So I won't go in to that.
But you can use ROW_NUMBER() to do the TOP 1 part, and then use a JOIN to associate that result with your main table...
WITH
ordered_logins AS
(
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY EmpID ORDER BY LoginTime DESC, ID DESC) AS row_ordinal
FROM
LoginEntry
)
SELECT
e.ID, l.EmpId, e.Name, l.LoginTime
FROM
Employee e
LEFT JOIN
ordered_logins l
ON l.EmpID = e.ID
AND l.row_oridnal = 1
The ROW_NUMBER() assigns every row a value from 1 upwards (per EmpID - the partition clause). It's order by loginTime descending, so the latest logins are first, and just in case two logins have the exact same time, it's secondarily ordered by ID desc.
Then the LEFT JOIN only picks the rows numbered 1 (the latest logins) and if there are no logins gives NULLs instead (so that the employee record is Not discarded due to a lack of join).
Note: the LEFT JOIN equivalent for APPLY is to use OUTER APPLY instead of CROSS APPLY.
Just for the sake of showing one more way to solve the problem, the "longer, messier, and slower" JOIN method that (I thought) MatBailie didn't show would look like this:
SELECT TOP (1) WITH TIES
e.ID
,l.EmpID
,e.Name
,l.LoginTime
FROM
dbo.Employee AS e
JOIN
dbo.LoginEntry AS l
ON
l.EmpID = e.ID
ORDER BY
ROW_NUMBER() OVER (PARTITION BY e.ID ORDER BY l.LoginTime DESC, ID DESC)
The ROW_NUMBER in the ORDER BY clause takes all of the logins for each Employee ID and numbers them by LoginTime with the most recent first (and using the LoginEntry ID as tie breaker, and thanks for that touch, Mat).
Then, the SELECT TOP (1) WITH TIES does its thing. The WITH TIES bit means that it selects the number one result from each PARTITION BY group in the ORDER BY clause.
you can use window function row_number()
with cte as(
SELECT e.ID,l.EmpId,Name,l.LoginTime, row_number() over(partition by e.ID order by l.LoginTime) as rn
FROM Employee e join LoginEntry l on l.EmpId=e.id
) select ID,EmpId,Name,LoginTime from cte where rn=1
demo link

Grouping with partition and over in TSql

I have a simple table
CREATE TABLE [dbo].[Tooling](
[Id] [int] IDENTITY(1,1) NOT NULL,
[Name] [nvarchar](50) NOT NULL,
[Status] [int] NOT NULL,
[DateFinished] [datetime] NULL,
[Tooling] [nvarchar](50) NULL,
[Updated] [datetime] NULL,
) ON [PRIMARY]
with following values
SET IDENTITY_INSERT [dbo].[Tooling] ON
GO
INSERT [dbo].[Tooling] ([Id], [Name], [Status], [DateFinished], [Tooling], [Updated]) VALUES (1, N'Large', 0, NULL, NULL, CAST(N'2015-05-05 00:00:00.000' AS DateTime))
GO
INSERT [dbo].[Tooling] ([Id], [Name], [Status], [DateFinished], [Tooling], [Updated]) VALUES (2, N'Large', 1, NULL, N'1', CAST(N'2015-05-10 00:00:00.000' AS DateTime))
GO
INSERT [dbo].[Tooling] ([Id], [Name], [Status], [DateFinished], [Tooling], [Updated]) VALUES (3, N'Small', 0, NULL, N'2', CAST(N'2015-05-11 00:00:00.000' AS DateTime))
GO
INSERT [dbo].[Tooling] ([Id], [Name], [Status], [DateFinished], [Tooling], [Updated]) VALUES (4, N'Large', 2, NULL, N'1', CAST(N'2015-05-12 00:00:00.000' AS DateTime))
GO
INSERT [dbo].[Tooling] ([Id], [Name], [Status], [DateFinished], [Tooling], [Updated]) VALUES (5, N'Large', 2, NULL, N'2', CAST(N'2015-05-12 00:00:00.000' AS DateTime))
GO
INSERT [dbo].[Tooling] ([Id], [Name], [Status], [DateFinished], [Tooling], [Updated]) VALUES (6, N'Large', 1, NULL, N'1', CAST(N'2015-05-13 00:00:00.000' AS DateTime))
GO
INSERT [dbo].[Tooling] ([Id], [Name], [Status], [DateFinished], [Tooling], [Updated]) VALUES (7, N'Large', 1, NULL, N'2', CAST(N'2015-05-14 00:00:00.000' AS DateTime))
GO
INSERT [dbo].[Tooling] ([Id], [Name], [Status], [DateFinished], [Tooling], [Updated]) VALUES (8, N'Small', 1, CAST(N'2015-05-15 00:00:00.000' AS DateTime), N'2', CAST(N'2015-05-15 00:00:00.000' AS DateTime))
GO
SET IDENTITY_INSERT [dbo].[Tooling] OFF
I want to create a view of the table that looks like its an entirely own table.
SELECT Id, Name, Status, DateFinished
FROM Tooling t order by t.id
If i run that I would like that the records with id 5 and 7 should be excluded since they dont change in the selected set from the previous row.
I had an idea to solve this by using, ROW_NUMBER() over partition
And by using group by but thats seems to incorrect(Could not to get it to work at all)
How do I group it when the should be grouped on that the value change and not the value it contains?
Any suggestions to solve this?
My end goal is also try to convert this to linq to entities...
OK, supposing it's ordered by id
select *
from (
select *, rng = row_number() over (partition by grp order by id)
from (
select *, grp = row_number() over (order by id) - row_number() over (partition by Name, Status, DateFinished order by id)
from tooling ) g
) gn
where rng = 1
order by id

T-SQL: Paging WITH TIES

I am trying to implement a paging routine that's a little different.
For the sake of a simple example, let's assume that I have a table defined and populated as follows:
DECLARE #Temp TABLE
(
ParentId INT,
[TimeStamp] DATETIME,
Value INT
);
INSERT INTO #Temp VALUES (1, '1/1/2013 00:00', 6);
INSERT INTO #Temp VALUES (1, '1/1/2013 01:00', 7);
INSERT INTO #Temp VALUES (1, '1/1/2013 02:00', 8);
INSERT INTO #Temp VALUES (2, '1/1/2013 00:00', 6);
INSERT INTO #Temp VALUES (2, '1/1/2013 01:00', 7);
INSERT INTO #Temp VALUES (2, '1/1/2013 02:00', 8);
INSERT INTO #Temp VALUES (3, '1/1/2013 00:00', 6);
INSERT INTO #Temp VALUES (3, '1/1/2013 01:00', 7);
INSERT INTO #Temp VALUES (3, '1/1/2013 02:00', 8);
TimeStamp will always be the same interval, e.g. daily data, 1 hour data, 1 minute data, etc. It will not be mixed.
For reporting and presentation purposes, I want to implement paging that:
Orders by TimeStamp
Starts out using a suggested pageSize (say 4), but will automatically adjust to include additional records matching on TimeStamp. In other words, if 1/1/2013 01:00 is included for one ParentId, the suggested pageSize will be overridden and all records for hour 01:00 will be included for all ParentId's. It's almost like the TOP WITH TIES option.
So running this query with pageSize of 4 would return 6 records. There are 3 hour 00:00 and 1 hour 01:00 by default, but because there are more hour 01:00's, the pageSize would be overridden to return all hour 00:00 and 01:00.
Here's what I have so far, and I think I'm close as it works for the first iteration, but sequent queries for the next pageSize+ rows doesn't work.
WITH CTE AS
(
SELECT ParentId, [TimeStamp], Value,
RANK() OVER(ORDER BY [TimeStamp]) AS rnk,
ROW_NUMBER() OVER(ORDER BY [TimeStamp]) AS rownum
FROM #Temp
)
SELECT *
FROM CTE
WHERE (rownum BETWEEN 1 AND 4) OR (rnk BETWEEN 1 AND 4)
ORDER BY TimeStamp, ParentId
The ROW_NUMBER ensures the minimum pageSize is met, but the RANK will include additional ties.
declare #Temp as Table ( ParentId Int, [TimeStamp] DateTime, [Value] Int );
insert into #Temp ( ParentId, [TimeStamp], [Value] ) values
(1, '1/1/2013 00:00', 6),
(1, '1/1/2013 01:00', 7),
(1, '1/1/2013 02:00', 8),
(2, '1/1/2013 00:00', 6),
(2, '1/1/2013 01:00', 7),
(2, '1/1/2013 02:00', 8),
(3, '1/1/2013 00:00', 6),
(3, '1/1/2013 01:00', 7),
(3, '1/1/2013 02:00', 8);
declare #PageSize as Int = 4;
declare #Page as Int = 1;
with Alpha as (
select ParentId, [TimeStamp], Value,
Rank() over ( order by [TimeStamp] ) as Rnk,
Row_Number() over ( order by [TimeStamp] ) as RowNum
from #Temp ),
Beta as (
select Min( Rnk ) as MinRnk, Max( Rnk ) as MaxRnk
from Alpha
where ( #Page - 1 ) * #PageSize < RowNum and RowNum <= #Page * #PageSize )
select A.*
from Alpha as A inner join
Beta as B on B.MinRnk <= A.Rnk and A.Rnk <= B.MaxRnk
order by [TimeStamp], ParentId;
EDIT:
An alternative query that assigns page numbers as it goes, so that next/previous page can be implemented without overlapping rows:
with Alpha as (
select ParentId, [TimeStamp], Value,
Rank() over ( order by [TimeStamp] ) as Rnk,
Row_Number() over ( order by [TimeStamp] ) as RowNum
from #Temp ),
Beta as (
select ParentId, [TimeStamp], Value, Rnk, RowNum, 1 as Page, 1 as PageRow
from Alpha
where RowNum = 1
union all
select A.ParentId, A.[TimeStamp], A.Value, A.Rnk, A.RowNum,
case when B.PageRow >= #PageSize and A.TimeStamp <> B.TimeStamp then B.Page + 1 else B.Page end,
case when B.PageRow >= #PageSize and A.TimeStamp <> B.TimeStamp then 1 else B.PageRow + 1 end
from Alpha as A inner join
Beta as B on B.RowNum + 1 = A.RowNum
)
select * from Beta
option ( MaxRecursion 0 )
Note that recursive CTEs often scale poorly.
I think your strategy of using row_number() and rank() is overcomplicating things.
Just pick the top 4 timestamps from the data. Then choose any timestamps that match those:
select *
from #temp
where [timestamp] in (select top 4 [timestamp] from #temp order by [TimeStamp])

Add not relevant column to group by script

This is my sample table and values:
CREATE TABLE [dbo].[Test]
(
[Id] BIGINT NOT NULL DEFAULT(0),
[VId] BIGINT NOT NULL DEFAULT(0),
[Level] INT NOT NULL DEFAULT(0)
);
INSERT INTO [dbo].[Test] ([Id], [VId], [Level]) VALUES (100, 1, 1);
INSERT INTO [dbo].[Test] ([Id], [VId], [Level]) VALUES (101, 1, 2);
INSERT INTO [dbo].[Test] ([Id], [VId], [Level]) VALUES (102, 1, 3);
INSERT INTO [dbo].[Test] ([Id], [VId], [Level]) VALUES (103, 2, 1);
INSERT INTO [dbo].[Test] ([Id], [VId], [Level]) VALUES (104, 3, 1);
INSERT INTO [dbo].[Test] ([Id], [VId], [Level]) VALUES (105, 3, 2);
INSERT INTO [dbo].[Test] ([Id], [VId], [Level]) VALUES (106, 4, 1);
INSERT INTO [dbo].[Test] ([Id], [VId], [Level]) VALUES (107, 4, 2);
INSERT INTO [dbo].[Test] ([Id], [VId], [Level]) VALUES (108, 4, 3);
INSERT INTO [dbo].[Test] ([Id], [VId], [Level]) VALUES (109, 4, 4);
So at now I use this script:
SELECT
[T].[VId], MAX ([T].[Level]) AS [MaxLevel]
FROM
[dbo].[Test] AS [T]
GROUP BY
[T].[VId];
And it returns:
VId MaxLevel
1 3
2 1
3 2
4 4
But I need Id column also and I can't add it to Group by script, I need the following values:
VId MaxLevel Id
1 3 102
2 1 103
3 2 105
4 4 109
What is your suggestion?
Also The following values is enough The Id's With Max(Level) in any VId :
Id
102
103
105
109
A 2008 take on the question, since that's what you're working with:
declare #Test table
(
[Id] BIGINT NOT NULL DEFAULT(0),
[VId] BIGINT NOT NULL DEFAULT(0),
[Level] INT NOT NULL DEFAULT(0)
);
INSERT INTO #Test ([Id], [VId], [Level])
VALUES (100, 1, 1),(101, 1, 2),(102, 1, 3),(103, 2, 1),(104, 3, 1),
(105, 3, 2),(106, 4, 1),(107, 4, 2),(108, 4, 3),(109, 4, 4);
;With Numbered as (
select *,
RANK() OVER (PARTITION BY VId ORDER BY [Level] desc) as rn
from #Test)
select VId,Level,Id from Numbered where rn=1
Note that (as with the other solutions) this will output multiple rows per VId if there are two rows with the same maximum level. If you don't want that, switch RANK() to ROW_NUMBER() and an arbitrary one will win - or if you want a specific winner in the case of a tie, add that condition into the ORDER BY of the window function.
Use joining with the same table by VId column
something like this:
SELECT [T].[VId], [T].[MaxLevel], [T1].[Id]
FROM [dbo].[Test] AS [T1] JOIN
(SELECT [T].[VId], MAX ([T].[Level]) AS [MaxLevel]
FROM [dbo].[Test] AS [T]
GROUP BY [T].[VId]) AS [T]
ON [T1].[VId] = [T].[VId]
AND [T1].[Level] = [T].[MaxLevel]
ORDER BY [T].[VId];
the result will be:
VId MaxLevel Id
1 3 102
2 1 103
3 2 105
4 4 109
Use this:
WITH LastLevels AS
(
SELECT
[T].[VId] AS [VID],
MAX ([T].[Level]) AS [MaxLevel]
FROM [dbo].[Test] AS [T]
GROUP BY [T].[VId]
)
SELECT [LastLevels].[VID],[LastLevels].[MaxLevel], [Te].[Id]
FROM [dbo].[Test] AS [Te]
INNER JOIN [LastLevels]
ON [LastLevels].[VID]=[Te].[VId]
AND [LastLevels].[MaxLevel]=[Te].[Level]
ORDER BY [LastLevels].[VID];
SELECT ID, [VId], [MaxLevel] From
(
SELECT
[T].[VId], MAX ([T].[Level]) AS [MaxLevel]
FROM
[dbo].[Test] AS [T]
GROUP BY
[T].[VId]
)K
INNER JOIN [Test] T on T.[VId] = K.[VId] and T.[Level] = K.MaxLevel