tsql - Setting sequential values without looping/cursoring - sql

I need to set a non-unique identifier in a data table. This would be sequential within a group ie. for each group, the ID should start at 1 and rise in incremements of 1 until the last row for that group.
This is illustrated by the table below. "New ID" is the column I need to populate.
Unique ID Group ID New ID
--------- -------- ------
1 1123 1
2 1123 2
3 1124 1
4 1125 1
5 1125 2
6 1125 3
7 1125 4
Is there any way of doing this without looping/cursoring? If looping/cursoring is the only way, what would the most efficient code be?
Thanks

One method is to use ROW_NUMBER() OVER(PARTITION BY ... ORDER BY ...) in an UPDATE...FROM statement with a subquery in the FROM clause.
update MyTable set NewID = B.NewID
from
MyTable as A
inner join (select UniqueID, ROW_NUMBER() over (partition by GroupID order by UniqueID) as NewID from MyTable) as B on B.UniqueID = A.UniqueID
MSDN has a good sample to get you started:
You need to utilize a subquery in the FROM clause in order to utilize a windows function (Row_Index())
Partition By tells the server when to reset the row numbers
Order By tells the server which way to order the group's NewID's

I agree with Damien's point in the comments but you don't need a JOIN you can just update the CTE directly.
;WITH cte AS
(
SELECT [New ID],
ROW_NUMBER() OVER (PARTITION BY [Group ID] ORDER BY [Unique ID]) AS _NewID
FROM #T
)
UPDATE cte
SET [New ID] = _NewID
Online Demo

Alternate to RowNumber() if you're on SS 2000
SELECT UniqueID,
GroupID,
(SELECT COUNT(T2.GroupID)
FROM myTable T2
WHERE GroupID <= T1.GroupID) AS NewID
FROM myTable T1

This solution will also work, if you are running an old version of mssql
--Test table:
DECLARE #t table(Unique_ID int, Group_ID int, New_ID int)
--Test data:
INSERT #t (unique_id, group_id)
SELECT 1, 1123 UNION ALL SELECT 2, 1123 UNION ALL SELECT 3, 1124 UNION ALL SELECT 4, 1125 UNION ALL SELECT 5, 1125 UNION ALL SELECT 6, 1125 UNION ALL SELECT 7, 1125
--Syntax:
UPDATE t
SET new_id =
(SELECT count(*)
FROM #t
WHERE t.unique_id >= unique_id and t.group_id = group_id
GROUP BY group_id)
FROM #t t
--Result:
SELECT * FROM #t
Unique_ID Group_ID New_ID
----------- ----------- -----------
1 1123 1
2 1123 2
3 1124 1
4 1125 1
5 1125 2
6 1125 3
7 1125 4

SELECT
UniqueId,
GroupID,
ROW_NUMBER() OVER (PARTITION BY GroupId ORDER BY UniqueId) AS NewIdx
FROM
....

Related

randomly select a fixed number of rows in each group in SQL server table

I need to do a SQL query to find some entries from a large table.
table:
id value1 value2
ny 35732 8023
ny 732 23
ny 292 109
nj 8232 813
nj 241 720
nj 590 287
I need to randomly select 2 entries from each distinct id group such that
id value1 value2
ny 35732 8023
ny 292 109
nj 8232 813
nj 590 287
My SQL code:
select top 2 * from my_table group by id value1 value2
But, it is not what I want.
I also need to insert the result into a table.
Any help would be appreciated.
You can use ROW_NUMBER and use NEWID() to generate a random ORDER:
EDIT: I replaced CHECKSUM(NEWID()) with NEWID() since I cannot prove which is faster and NEWID() is I think the most used.
WITH CTE AS(
SELECT *,
RN = ROW_NUMBER() OVER(PARTITION BY id ORDER BY NEWID())
FROM tbl
)
SELECT
id, value1, value2
FROM Cte
WHERE RN <= 2
SQL Fiddle
The fiddle should show different result among different runs.
If you're inserting this to another table use this subquery version:
INSERT INTO yourNewTable(id, value1, value2)
SELECT
id, value1, value2
FROM (
SELECT *,
RN = ROW_NUMBER() OVER(PARTITION BY id ORDER BY NEWID())
FROM tbl
)t
WHERE RN <= 2
DECLARE #Table1 TABLE
(id varchar(2), value1 int, value2 int)
;
INSERT INTO #Table1
(id, value1, value2)
VALUES
('ny', 35732, 8023),
('ny', 732, 23),
('ny', 292, 109),
('nj', 8232, 813),
('nj', 241, 720),
('nj', 590, 287)
;
SELECT *
FROM #Table1 T
WHERE
(
SELECT COUNT(*)
FROM #Table1 TT
WHERE T.id = TT.id AND
T.value1 >= TT.value1
) <= 2
Microsoft is all over this:
SELECT TOP 10 PERCENT *
FROM Table1
ORDER BY NEWID()
see https://msdn.microsoft.com/en-us/library/Cc441928.aspx for more examples

adding a value to a column from data in next row sql

Base Table
id line_number
1 1232
2 1456
3 1832
4 2002
I wish to add values to a new table such that the next row's value becomes the value in a new column with the last row's value being same..
The final output I need to produce is:
id line_number end_line_number
1 1232 1456
2 1456 1832
3 1832 2002
4 2002 2002
The database is sql server.
Any help is sincerely appreciated.
Thanks
After SQL Server 2012, you can use LEAD like this.
;WITH BaseTable as
(
SELECT 1 id, 1232 line_number
UNION ALL SELECT 2 , 1456
UNION ALL SELECT 3, 1832
UNION ALL SELECT 4 , 2002
)
SELECT id,line_number,(LEAD(line_number,1,line_number) OVER(ORDER BY id ASC))
FROM BaseTable
For previous versions, try this
;WITH BaseTable as
(
SELECT 1 id, 1232 line_number
UNION ALL SELECT 2 , 1456
UNION ALL SELECT 3, 1832
UNION ALL SELECT 4 , 2002
), OrderedBaseTable as
(
SELECT id,line_number,ROW_NUMBER() OVER(ORDER BY id asc) rw
FROM BaseTable
)
SELECT t1.id,t1.line_number,ISNULL(t2.line_number,t1.line_number) next_line_number
FROM OrderedBaseTable t1
LEFT JOIN OrderedBaseTable t2
ON t1.rw = t2.rw - 1
Try this
With T as (
Select id, line_number, Row_Number() OVER(Order By id) + 1 As TempId From TableName)
Select T1.id, T1.line_number, ISNULL(T2.line_number,T1.line_number) As end_line_number From T T1
Left Join T T2 on T2.id = T1.TempId
SQL Fiddle Demo

SQL - Identify Distinct Values Including Count For ALL Columns In A Table

I want to be able to identify the distinct values including a count of the value for each column in a table.
I reviewed - Get distinct records with counts
And it shows me how to do this for an individual column and works great. However, I have a table with over 600 columns, and coding each column would be incredibly time consuming.
Is there a way to code my sql where I could get these same results for all columns in a table, without having to individually input each column?
So to use the example from the link:
personid, msg
-------------
1, 'msg1'
2, 'msg2'
2, 'msg3'
3, 'msg4'
1, 'msg2'
My results would be:
personid, count | msg, count
-----------------------------
1, 2 | msg1, 1
2, 2 | msg2, 2
3, 1 | msg3, 1
_, _ | msg4, 1
Is this possible? I've tried getting at it using distincts and wildcards (*) but no luck.
Apologize if this isn't detailed enough, this is my first post and I'm no SQL expert, and Googling hasn't found an answer. Thanks.
I am not sure that it convinient, but you can do it like this:
CREATE TABLE #temp (
personid int,
message nvarchar(max)
);
GO
INSERT INTO #temp
SELECT 1, 'msg1' UNION ALL
SELECT 2, 'msg2' UNION ALL
SELECT 2, 'msg3' UNION ALL
SELECT 3, 'msg4' UNION ALL
SELECT 1, 'msg2';
GO
SELECT
isnull(t1.rn, t2.rn) as rn,
t1.personid as personid, t1.cnt as personid_cnt,
t2.message as message, t2.cnt as message_cnt
FROM
(SELECT personid, count(*) as cnt,
ROW_NUMBER() over (order by personid) as rn
FROM #temp GROUP BY personid) t1
FULL JOIN
(SELECT message, count(*) as cnt,
ROW_NUMBER() over (order by message) as rn
FROM #temp GROUP BY message) t2
ON t1.rn = t2.rn
ORDER BY rn
DROP table #temp;
result:
rn personid personid_cnt message message_cnt
1 1 2 msg1 1
2 2 2 msg2 2
3 3 1 msg3 1
4 NULL NULL msg4 1

Coalesce over Rows in MSSQL 2008,

I'm trying to determine the best approach here in MSSQL 2008.
Here is my sample data
TransDate Id Active
-------------------------
1/18 1pm 5 1
1/18 2pm 5 0
1/18 3pm 5 Null
1/18 4pm 5 1
1/18 5pm 5 0
1/18 6pm 5 Null
If grouped by Id and ordered by the TransDate, I want the last Non Null Value for the Active Column, and the MAX of TransDate
SELECT MAX(TransDate) AS TransDate,
Id,
--LASTNonNull(Active) AS Active
Here would be the results:
TransDate Id Active
---------------------
1/18 6pm 5 0
It would be like a Coalesce but over the rows, instead of two values/columns.
There would be many other columns that would also have this similiar method applied, so I really don't want to make a seperate join for each of the columns.
Any ideas?
I'd probably use a correlated sub query.
SELECT MAX(TransDate) AS TransDate,
Id,
(SELECT TOP (1) Active
FROM T t2
WHERE t2.Id = t1.Id
AND Active IS NOT NULL
ORDER BY TransDate DESC) AS Active
FROM T t1
GROUP BY Id
A way without
SELECT
Id,
MAX(TransDate) AS TransDate,
CAST(RIGHT(MAX(CONVERT(CHAR(23),TransDate,121) + CAST(Active AS CHAR(1))),1) AS BIT) AS Active,
/*You can probably figure out a more efficient thing to
compare than the above depending on your data. e.g.*/
CAST(MAX(DATEDIFF(SECOND,'19500101',TransDate) * CAST(10 AS BIGINT) + Active)%10 AS BIT) AS Active2
FROM T
GROUP BY Id
Or following the comments would cross apply work better for you?
WITH T (TransDate, Id, Active, SomeOtherColumn) AS
(
select GETDATE(), 5, 1, 'A' UNION ALL
select 1+GETDATE(), 5, 0, 'B' UNION ALL
select 2+GETDATE(), 5, null, 'C' UNION ALL
select 3+GETDATE(), 5, 1, 'D' UNION ALL
select 4+GETDATE(), 5, 0, 'E' UNION ALL
select 5+GETDATE(), 5, null,'F'
),
T1 AS
(
SELECT MAX(TransDate) AS TransDate,
Id
FROM T
GROUP BY Id
)
SELECT T1.TransDate,
Id,
CA.Active AS Active,
CA.SomeOtherColumn AS SomeOtherColumn
FROM T1
CROSS APPLY (SELECT TOP (1) Active, SomeOtherColumn
FROM T t2
WHERE t2.Id = T1.Id
AND Active IS NOT NULL
ORDER BY TransDate DESC) CA
This example should help, using analytical functions Max() OVER and Row_Number() OVER
create table tww( transdate datetime, id int, active bit)
insert tww select GETDATE(), 5, 1
insert tww select 1+GETDATE(), 5, 0
insert tww select 2+GETDATE(), 5, null
insert tww select 3+GETDATE(), 5, 1
insert tww select 4+GETDATE(), 5, 0
insert tww select 5+GETDATE(), 5, null
select maxDate as Transdate, id, Active
from (
select *,
max(transdate) over (partition by id) maxDate,
ROW_NUMBER() over (partition by id
order by case when active is not null then 0 else 1 end, transdate desc) rn
from tww
) x
where rn=1
Another option, quite expensive, would be doing it through XML. For educational purposes only
select
ID = n.c.value('#id', 'int'),
trandate = n.c.value('(data/transdate)[1]', 'datetime'),
active = n.c.value('(data/active)[1]', 'bit')
from
(select xml=convert(xml,
(select id [#id],
( select *
from tww t
where t.id=tww.id
order by transdate desc
for xml path('data'), type)
from tww
group by id
for xml path('node'), root('root'), elements)
)) x cross apply xml.nodes('root/node') n(c)
It works on the principle that the XML generated has each record as a child node of the ID. Null columns have been omitted, so the first column found using xpath (child/columnname) is the first non-null value similar to COALESCE.
You could use a subquery:
SELECT MAX(TransDate) AS TransDate
, Id
, (
SELECT TOP 1 t2.Active
FROM YourTable t2
WHERE t1.id = t2.id
and t2.Active is not null
ORDER BY
t2.TransDate desc
)
FROM YourTable t1
I created a temp table named #temp to test my solution, and here is what I came up with:
transdate id active
1/1/2011 12:00:00 AM 5 1
1/2/2011 12:00:00 AM 5 0
1/3/2011 12:00:00 AM 5 null
1/4/2011 12:00:00 AM 5 1
1/5/2011 12:00:00 AM 5 0
1/6/2011 12:00:00 AM 5 null
1/1/2011 12:00:00 AM 6 2
1/2/2011 12:00:00 AM 6 3
1/3/2011 12:00:00 AM 6 null
1/4/2011 12:00:00 AM 6 2
1/5/2011 12:00:00 AM 6 null
This query...
select max(a.transdate) as transdate, a.id, (
select top (1) b.active
from #temp b
where b.active is not null
and b.id = a.id
order by b.transdate desc
) as active
from #temp a
group by a.id
Returns these results.
transdate id active
1/6/2011 12:00:00 AM 5 0
1/5/2011 12:00:00 AM 6 2
Assuming a table named "test1", how about using ROW_NUMBER, OVER and PARTITION BY?
SELECT transdate, id, active FROM
(SELECT transdate, ROW_NUMBER() OVER(PARTITION BY id ORDER BY transdate desc) AS rownumber, id, active
FROM test1
WHERE active is not null) a
WHERE a.rownumber = 1

Union and order by

Consider a table like
tbl_ranks
--------------------------------
family_id | item_id | view_count
--------------------------------
1 10 101
1 11 112
1 13 109
2 21 101
2 22 112
2 23 109
3 30 101
3 31 112
3 33 109
4 40 101
4 51 112
4 63 109
5 80 101
5 81 112
5 88 109
I need to generate a result set with the top two(2) rows for a subset of family ids (say, 1,2,3 and 4) ordered by view count.
I'd like to do something like
select top 2 * from tbl_ranks where family_id = 1 order by view_count
union all
select top 2 * from tbl_ranks where family_id = 2 order by view_count
union all
select top 2 * from tbl_ranks where family_id = 3 order by view_count
union all
select top 2 * from tbl_ranks where family_id = 4 order by view_count
but, of course, order by isn't valid in a union all context in this manner. Any suggestions? I know I could run a set of 4 queries, store the results into a temp table and select the contents of that temp as the final result, but I'd rather avoid using a temp table if possible.
Note: in the real app, the number of records per family id is indeterminate, and the view_counts are also not fixed as they appear in the above example.
You can try something like this
DECLARE #tbl_ranks TABLE(
family_id INT,
item_id INT,
view_count INT
)
INSERT INTO #tbl_ranks SELECT 1,10,101
INSERT INTO #tbl_ranks SELECT 1,11,112
INSERT INTO #tbl_ranks SELECT 1,13,109
INSERT INTO #tbl_ranks SELECT 2,21,101
INSERT INTO #tbl_ranks SELECT 2,22,112
INSERT INTO #tbl_ranks SELECT 2,23,109
INSERT INTO #tbl_ranks SELECT 3,30,101
INSERT INTO #tbl_ranks SELECT 3,31,112
INSERT INTO #tbl_ranks SELECT 3,33,109
INSERT INTO #tbl_ranks SELECT 4,40,101
INSERT INTO #tbl_ranks SELECT 4,51,112
INSERT INTO #tbl_ranks SELECT 4,63,109
INSERT INTO #tbl_ranks SELECT 5,80,101
INSERT INTO #tbl_ranks SELECT 5,81,112
INSERT INTO #tbl_ranks SELECT 5,88,109
SELECT *
FROm (
SELECT *,
ROW_NUMBER() OVER(PARTITION BY family_id ORDER BY view_count DESC) MyOrder
FROM #tbl_ranks
) MyOrders
WHERE MyOrder <= 2
If you're using SQL Server 2005 or later, you can take advantage of analytic functions:
SELECT * FROM (
SELECT rank() OVER (PARTITION BY family_id ORDER BY view_count) AS RNK, * FROM ...
)
WHERE RNK <= 2
ORDER BY ...
SELECT tro.*
FROM family
CROSS APPLY
(
SELECT TOP 2 *
FROM tbl_ranks tr
WHERE tr.family_id = family.id
ORDER BY
view_count DESC
) tro
WHERE family.id IN (1, 2, 3, 4)
If you don't have an actual family table, you can construct it using a set of unions or a recursive CTE:
WITH family AS
(
SELECT 1 AS id
UNION ALL
SELECT 2 AS id
UNION ALL
SELECT 3 AS id
UNION ALL
SELECT 4 AS id
)
SELECT tro.*
FROM family
CROSS APPLY
(
SELECT TOP 2 *
FROM tbl_ranks tr
WHERE tr.family_id = family.id
ORDER BY
view_count DESC
) tro
WHERE family.id IN (1, 2, 3, 4)
Make sure you have an index on tbl_ranks (family_id, viewcount).
This will be efficient if you have lots of ranks per family, since analytic functions like ROW_NUMBER will not use the TOP method if used with PARTITION BY.
Use:
SELECT *
FROM (select *,
ROW_NUMBER() OVER (PARTITION BY family_id ORDER BY view_count DESC) 'rank'
from tbl_ranks) x
WHERE x.rank <= 2
ORDER BY ...
The rationale is to assign a ranking, and then filter based on it.
You only have to tweak your suggested SQL commands a little bit to make it work like you wanted. To bind TOP and ORDER BY you can put the statement inside paranthesis which you select from and give a name (not used here but required).
With the DECLARE and INSERT statements from Adriaan Stander's answer the following
SELECT * FROM (SELECT TOP 2 * FROM #tbl_ranks WHERE family_id = 1 ORDER BY view_count) AS dummy1 UNION ALL
SELECT * FROM (SELECT TOP 2 * FROM #tbl_ranks WHERE family_id = 2 ORDER BY view_count) AS dummy2 UNION ALL
SELECT * FROM (SELECT TOP 2 * FROM #tbl_ranks WHERE family_id = 3 ORDER BY view_count) AS dummy3 UNION ALL
SELECT * FROM (SELECT TOP 2 * FROM #tbl_ranks WHERE family_id = 4 ORDER BY view_count) AS dummy4
gives
family_id item_id view_count
1 10 101
1 13 109
2 21 101
2 23 109
3 30 101
3 33 109
4 40 101
4 63 109