MSSQL: How to increment a int-column grouped by another column? - sql

Given the following table:
UserId | Idx
1 | 0
1 | 1
1 | 3
1 | 5
2 | 1
2 | 2
2 | 3
2 | 5
And I want to update the Idx column that it is correctly incremented grouped by UserId column:
UserId | Idx
1 | 0
1 | 1
1 | 2
1 | 3
2 | 0
2 | 1
2 | 2
2 | 3
I know its possible with T-SQL (with Cursor), but is it also possible with a single statement?
Thank you

You can use correlated subquery :
update t
set idx = coalesce((select count(*)
from table as t1
where t1.userid = t.userid and t1.idx < t.idx
), 0
);

Use ROW_NUMBER() with Partition
update tablex set Idx=A.Idx
from tablex T
inner join
(
select UserID ,ID,ROW_NUMBER() OVER (PARTITION BY UserID ORDER By UserID)-1 Idx
from tablex
) A on T.ID=A.ID

Use an updatable CTE:
with toupdate as (
select t.*,
row_number() over (partition by user_id order by idx) - 1 as new_idx
from t
)
update toupdate
set idx = new_idx
where new_idx <> new_idx;
This should be the fastest method for solving this problem.

Related

Subquery with order by random does not work on database with 500.000.000 rows

I want to perform an update query on a offline database with DB Browser for SQLite.
I tested my query on a few rows and its working perfectly there, but not with my database which has 500.000.000 rows +. It looks like the random subquery is not executed at all there and the first rows of the group by are taken.
The query:
UPDATE
table
SET typ = 3 WHERE id IN (
SELECT id FROM (
SELECT * FROM table ORDER BY RANDOM()
)
WHERE typ = 1 GROUP BY idg HAVING COUNT(idg) > 5
)
Sample data:
id |idg| typ
1 | 1 | 1
2 | 1 | 1
3 | 1 | 1
4 | 1 | 1
5 | 1 | 1
6 | 1 | 1
7 | 1 | 1
8 | 2 | 1
9 | 2 | 1
10 | 2 | 1
11 | 2 | 1
12 | 2 | 1
13 | 2 | 1
14 | 2 | 1
15 | 2 | 1
Is there any fix or workaround to execute my query successfully ?
If your version of SQLite is 3.33.0+, you can use the UPDATE ... FROM... syntax, so that you can join to the table a query that uses window function ROW_NUMBER() to check if a specific idg has more than 5 rows and returns a random id:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY idg ORDER BY RANDOM()) rn
FROM tablename
WHERE typ = 1
)
UPDATE tablename AS t
SET typ = 3
FROM cte AS c
WHERE t.id = c.id AND c.rn = 6; -- rn = 6 makes sure that there are at least 6 rows
See the demo.
For SQLite 3.25.0+ use the operator IN with ROW_NUMBER() window function:
UPDATE tablename
SET typ = 3
WHERE id IN (
SELECT id
FROM (
SELECT *, ROW_NUMBER() OVER (PARTITION BY idg ORDER BY RANDOM()) rn
FROM tablename
WHERE typ = 1
)
WHERE rn = 6 -- rn = 6 makes sure that there are at least 6 rows
);
See the demo.

SQL How to filter table with values having more than one unique value of another column

I have data table Customers that looks like this:
ID | Sequence No |
1 | 1 |
1 | 2 |
1 | 3 |
2 | 1 |
2 | 1 |
2 | 1 |
3 | 1 |
3 | 2 |
I would like to filter the table so that only IDs with more than 1 distinct count of Sequence No remain.
Expected output:
ID | Sequence No |
1 | 1 |
1 | 2 |
1 | 3 |
3 | 1 |
3 | 2 |
I tried
select ID, Sequence No
from Customers
where count(distinct Sequence No) > 1
order by ID
but I'm getting error. How to solve this?
You can get the desired result by using the below query. This is similar to what you were trying -
Sample Table & Data
Declare #Data table
(Id int, [Sequence No] int)
Insert into #Data
values
(1 , 1 ),
(1 , 2 ),
(1 , 3 ),
(2 , 1 ),
(2 , 1 ),
(2 , 1 ),
(3 , 1 ),
(3 , 2 )
Query
Select * from #Data
where ID in(
select ID
from #Data
Group by ID
Having count(distinct [Sequence No]) > 1
)
Using analytic functions, we can try:
WITH cte AS (
SELECT *, MIN([Sequence No]) OVER (PARTITION BY ID) min_seq,
MAX([Sequence No]) OVER (PARTITION BY ID) max_seq
FROM Customers
)
SELECT ID, [Sequence No]
FROM cte
WHERE min_seq <> max_seq
ORDER BY ID, [Sequence No];
Demo
We are checking for a distinct count of sequence number by asserting that the minimum and maximum sequence numbers are not the same for a given ID. The above query could benefit from the following index:
CREATE INDEX idx ON Customers (ID, [Sequence No]);
This would let the min and max values be looked up faster.

SQL - Deleting duplicate columns only if another column matches [duplicate]

This question already has answers here:
SQL - Delete duplicate columns error [duplicate]
(4 answers)
How to delete duplicate rows in SQL Server?
(26 answers)
Closed 4 years ago.
I have the following table (TBL_VIDEO) with duplicate column entries in "TIMESTAMP", and I want to remove them only if the "CAMERA" number matches.
BEFORE:
ANALYSIS_ID | TIMESTAMP | EMOTION | CAMERA
-------------------------------------------
1 | 5 | HAPPY | 1
2 | 10 | SAD | 1
3 | 10 | SAD | 1
4 | 5 | HAPPY | 2
5 | 15 | ANGRY | 2
6 | 15 | HAPPY | 2
AFTER:
ANALYSIS_ID | TIMESTAMP | EMOTION | CAMERA
-------------------------------------------
1 | 5 | HAPPY | 1
2 | 10 | SAD | 1
4 | 5 | HAPPY | 2
5 | 15 | ANGRY | 2
I have attempted this statement but the columns wouldn't delete accordingly. I appreciate all the help to produce a correct SQL statement. Thanks in advance!
delete y
from TBL_VIDEO y
where exists (select 1 from TBL_VIDEO y2 where y.TIMESTAMP = y2.TIMESTAMP and y2.ANALYSIS_ID < y.ANALYSIS_ID, y.CAMERA = y.CAMERA, y2.CAMERA = y2.CAMERA);
try this:
delete f2 from (
select row_number() over(partition by TIMESTAMP, CAMERA order by ANALYSIS_ID) rang
from yourtable f1
) f2 where f2.rang>1
Other solution :
delete f1 from yourtable f1
where exists
(
select * from yourtable f2
where f2.TIMESTAMP=f1.TIMESTAMP and f2.CAMERA=f1.CAMERA and f1.ANALYSIS_ID>f2.ANALYSIS_ID
)
use row_number and find the duplicate and delete them
delete from
(select *,row_number() over(partition by TIMESTAMP,CAMERA order by ANALYSIS_ID) as rn from TBL_VIDEO
) t1 where rn>1
;WITH cte
AS
(
select ANALYSIS_ID,
ROW_NUMBER() over(partition by TIMESTAMP, CAMERA order by ANALYSIS_ID) rnk
)
DELETE FROM cte WHERE cte.rnk > 1
You can use subquery :
select v.*
from tbl_video v
where analysis_id = (select min(v1.analysis_id)
from tbl_video v1
where v1.timestamp = v.timestamp and
v1.camera = v.camera
);
However, analytical function with top (1) with ties clause also useful :
select top (1) with ties v.*
from tbl_video v
order by row_number() over (partition by v.timestamp, v.camera order by v.analysis_id);
So, your delete version would be :
delete v
from tbl_video v
where analysis_id = (select min(v1.analysis_id)
from tbl_video v1
where v1.timestamp = v.timestamp and
v1.camera = v.camera
);

SQL Query using Partition By

I have following table name JobTitle
JobID LanaguageID
-----------------
1 1
1 2
1 3
2 1
2 2
3 4
4 5
5 2
I am selecting all records from table except duplicate JobID's for which count > 1. I am selecting only one record/first row from the duplicate JobID's.
Now I am passing LanguageID as paramter to stored procedure and I want to select duplicate JobID for that languageID along with the other records Also.
If I have passed languageID as 1 then output should come as follows
JobID LanaguageID
-----------------
1 1
2 1
3 4
4 5
5 2
I have tried using following query.
with CTE_RN as
(
SELECT ROW_NUMBER() OVER(PARTITION BY JobTitle.JobID ORDER BY JobTitle.JobTitle) AS RN
FROM JobTitle
INNER JOIN JobTitle_Lang
ON JobTitle.JobTitleID = JobTitle_Lang.JobTitleID
)
But I am unable to use WHERE clause in the above query.
Is any different approch should be followed. Or else how can i modify the query to get the desired output
with CTE_RN as
(
SELECT
JobID, LanaguageID,
ROW_NUMBER() OVER(PARTITION BY JobTitle.JobID ORDER BY JobTitle.JobTitle) AS RN
FROM JobTitle
INNER JOIN JobTitle_Lang ON JobTitle.JobTitleID = JobTitle_Lang.JobTitleID
)
select
from CTE_RN
where RN = 1 or LanguageID = #LanguageID
update
simplified a bit (join removed), but you'll get the idea:
declare #LanguageID int = 2
;with cte_rn as
(
select
JobID, LanguageID,
row_number() over(
partition by JobTitle.JobID
order by
case when LanguageID = #LanguageID then 0 else 1 end,
LanguageID
) as rn
from JobTitle
)
select *
from cte_rn
where rn = 1
sql fiddle demo
SELECT b.[JobID], b.[LanaguageID]
FROM
(SELECT a.[JobID], a.[LanaguageID],
ROW_NUMBER() OVER(PARTITION BY a.[JobID] ORDER BY a.[LanaguageID]) AS [row]
FROM [JobTitle] a) b
WHERE b.[row] = 1
Result
| JOBID | LANAGUAGEID |
--------|-------------|
| 1 | 1 |
| 2 | 1 |
| 3 | 4 |
| 4 | 5 |
| 5 | 2 |
See a demo

Grouping SQL Results based on order

I have table with data something like this:
ID | RowNumber | Data
------------------------------
1 | 1 | Data
2 | 2 | Data
3 | 3 | Data
4 | 1 | Data
5 | 2 | Data
6 | 1 | Data
7 | 2 | Data
8 | 3 | Data
9 | 4 | Data
I want to group each set of RowNumbers So that my result is something like this:
ID | RowNumber | Group | Data
--------------------------------------
1 | 1 | a | Data
2 | 2 | a | Data
3 | 3 | a | Data
4 | 1 | b | Data
5 | 2 | b | Data
6 | 1 | c | Data
7 | 2 | c | Data
8 | 3 | c | Data
9 | 4 | c | Data
The only way I know where each group starts and stops is when the RowNumber starts over. How can I accomplish this? It also needs to be fairly efficient since the table I need to do this on has 52 Million Rows.
Additional Info
ID is truly sequential, but RowNumber may not be. I think RowNumber will always begin with 1 but for example the RowNumbers for group1 could be "1,1,2,2,3,4" and for group2 they could be "1,2,4,6", etc.
For the clarified requirements in the comments
The rownumbers for group1 could be "1,1,2,2,3,4" and for group2 they
could be "1,2,4,6" ... a higher number followed by a lower would be a
new group.
A SQL Server 2012 solution could be as follows.
Use LAG to access the previous row and set a flag to 1 if that row is the start of a new group or 0 otherwise.
Calculate a running sum of these flags to use as the grouping value.
Code
WITH T1 AS
(
SELECT *,
LAG(RowNumber) OVER (ORDER BY ID) AS PrevRowNumber
FROM YourTable
), T2 AS
(
SELECT *,
IIF(PrevRowNumber IS NULL OR PrevRowNumber > RowNumber, 1, 0) AS NewGroup
FROM T1
)
SELECT ID,
RowNumber,
Data,
SUM(NewGroup) OVER (ORDER BY ID
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS Grp
FROM T2
SQL Fiddle
Assuming ID is the clustered index the plan for this has one scan against YourTable and avoids any sort operations.
If the ids are truly sequential, you can do:
select t.*,
(id - rowNumber) as grp
from t
Also you can use recursive CTE
;WITH cte AS
(
SELECT ID, RowNumber, Data, 1 AS [Group]
FROM dbo.test1
WHERE ID = 1
UNION ALL
SELECT t.ID, t.RowNumber, t.Data,
CASE WHEN t.RowNumber != 1 THEN c.[Group] ELSE c.[Group] + 1 END
FROM dbo.test1 t JOIN cte c ON t.ID = c.ID + 1
)
SELECT *
FROM cte
Demo on SQLFiddle
How about:
select ID, RowNumber, Data, dense_rank() over (order by grp) as Grp
from (
select *, (select min(ID) from [Your Table] where ID > t.ID and RowNumber = 1) as grp
from [Your Table] t
) t
order by ID
This should work on SQL 2005. You could also use rank() instead if you don't care about consecutive numbers.