SQL Query using Partition By - sql

I have following table name JobTitle
JobID LanaguageID
-----------------
1 1
1 2
1 3
2 1
2 2
3 4
4 5
5 2
I am selecting all records from table except duplicate JobID's for which count > 1. I am selecting only one record/first row from the duplicate JobID's.
Now I am passing LanguageID as paramter to stored procedure and I want to select duplicate JobID for that languageID along with the other records Also.
If I have passed languageID as 1 then output should come as follows
JobID LanaguageID
-----------------
1 1
2 1
3 4
4 5
5 2
I have tried using following query.
with CTE_RN as
(
SELECT ROW_NUMBER() OVER(PARTITION BY JobTitle.JobID ORDER BY JobTitle.JobTitle) AS RN
FROM JobTitle
INNER JOIN JobTitle_Lang
ON JobTitle.JobTitleID = JobTitle_Lang.JobTitleID
)
But I am unable to use WHERE clause in the above query.
Is any different approch should be followed. Or else how can i modify the query to get the desired output

with CTE_RN as
(
SELECT
JobID, LanaguageID,
ROW_NUMBER() OVER(PARTITION BY JobTitle.JobID ORDER BY JobTitle.JobTitle) AS RN
FROM JobTitle
INNER JOIN JobTitle_Lang ON JobTitle.JobTitleID = JobTitle_Lang.JobTitleID
)
select
from CTE_RN
where RN = 1 or LanguageID = #LanguageID
update
simplified a bit (join removed), but you'll get the idea:
declare #LanguageID int = 2
;with cte_rn as
(
select
JobID, LanguageID,
row_number() over(
partition by JobTitle.JobID
order by
case when LanguageID = #LanguageID then 0 else 1 end,
LanguageID
) as rn
from JobTitle
)
select *
from cte_rn
where rn = 1
sql fiddle demo

SELECT b.[JobID], b.[LanaguageID]
FROM
(SELECT a.[JobID], a.[LanaguageID],
ROW_NUMBER() OVER(PARTITION BY a.[JobID] ORDER BY a.[LanaguageID]) AS [row]
FROM [JobTitle] a) b
WHERE b.[row] = 1
Result
| JOBID | LANAGUAGEID |
--------|-------------|
| 1 | 1 |
| 2 | 1 |
| 3 | 4 |
| 4 | 5 |
| 5 | 2 |
See a demo

Related

Subquery with order by random does not work on database with 500.000.000 rows

I want to perform an update query on a offline database with DB Browser for SQLite.
I tested my query on a few rows and its working perfectly there, but not with my database which has 500.000.000 rows +. It looks like the random subquery is not executed at all there and the first rows of the group by are taken.
The query:
UPDATE
table
SET typ = 3 WHERE id IN (
SELECT id FROM (
SELECT * FROM table ORDER BY RANDOM()
)
WHERE typ = 1 GROUP BY idg HAVING COUNT(idg) > 5
)
Sample data:
id |idg| typ
1 | 1 | 1
2 | 1 | 1
3 | 1 | 1
4 | 1 | 1
5 | 1 | 1
6 | 1 | 1
7 | 1 | 1
8 | 2 | 1
9 | 2 | 1
10 | 2 | 1
11 | 2 | 1
12 | 2 | 1
13 | 2 | 1
14 | 2 | 1
15 | 2 | 1
Is there any fix or workaround to execute my query successfully ?
If your version of SQLite is 3.33.0+, you can use the UPDATE ... FROM... syntax, so that you can join to the table a query that uses window function ROW_NUMBER() to check if a specific idg has more than 5 rows and returns a random id:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY idg ORDER BY RANDOM()) rn
FROM tablename
WHERE typ = 1
)
UPDATE tablename AS t
SET typ = 3
FROM cte AS c
WHERE t.id = c.id AND c.rn = 6; -- rn = 6 makes sure that there are at least 6 rows
See the demo.
For SQLite 3.25.0+ use the operator IN with ROW_NUMBER() window function:
UPDATE tablename
SET typ = 3
WHERE id IN (
SELECT id
FROM (
SELECT *, ROW_NUMBER() OVER (PARTITION BY idg ORDER BY RANDOM()) rn
FROM tablename
WHERE typ = 1
)
WHERE rn = 6 -- rn = 6 makes sure that there are at least 6 rows
);
See the demo.

SQL How to filter table with values having more than one unique value of another column

I have data table Customers that looks like this:
ID | Sequence No |
1 | 1 |
1 | 2 |
1 | 3 |
2 | 1 |
2 | 1 |
2 | 1 |
3 | 1 |
3 | 2 |
I would like to filter the table so that only IDs with more than 1 distinct count of Sequence No remain.
Expected output:
ID | Sequence No |
1 | 1 |
1 | 2 |
1 | 3 |
3 | 1 |
3 | 2 |
I tried
select ID, Sequence No
from Customers
where count(distinct Sequence No) > 1
order by ID
but I'm getting error. How to solve this?
You can get the desired result by using the below query. This is similar to what you were trying -
Sample Table & Data
Declare #Data table
(Id int, [Sequence No] int)
Insert into #Data
values
(1 , 1 ),
(1 , 2 ),
(1 , 3 ),
(2 , 1 ),
(2 , 1 ),
(2 , 1 ),
(3 , 1 ),
(3 , 2 )
Query
Select * from #Data
where ID in(
select ID
from #Data
Group by ID
Having count(distinct [Sequence No]) > 1
)
Using analytic functions, we can try:
WITH cte AS (
SELECT *, MIN([Sequence No]) OVER (PARTITION BY ID) min_seq,
MAX([Sequence No]) OVER (PARTITION BY ID) max_seq
FROM Customers
)
SELECT ID, [Sequence No]
FROM cte
WHERE min_seq <> max_seq
ORDER BY ID, [Sequence No];
Demo
We are checking for a distinct count of sequence number by asserting that the minimum and maximum sequence numbers are not the same for a given ID. The above query could benefit from the following index:
CREATE INDEX idx ON Customers (ID, [Sequence No]);
This would let the min and max values be looked up faster.

MSSQL: How to increment a int-column grouped by another column?

Given the following table:
UserId | Idx
1 | 0
1 | 1
1 | 3
1 | 5
2 | 1
2 | 2
2 | 3
2 | 5
And I want to update the Idx column that it is correctly incremented grouped by UserId column:
UserId | Idx
1 | 0
1 | 1
1 | 2
1 | 3
2 | 0
2 | 1
2 | 2
2 | 3
I know its possible with T-SQL (with Cursor), but is it also possible with a single statement?
Thank you
You can use correlated subquery :
update t
set idx = coalesce((select count(*)
from table as t1
where t1.userid = t.userid and t1.idx < t.idx
), 0
);
Use ROW_NUMBER() with Partition
update tablex set Idx=A.Idx
from tablex T
inner join
(
select UserID ,ID,ROW_NUMBER() OVER (PARTITION BY UserID ORDER By UserID)-1 Idx
from tablex
) A on T.ID=A.ID
Use an updatable CTE:
with toupdate as (
select t.*,
row_number() over (partition by user_id order by idx) - 1 as new_idx
from t
)
update toupdate
set idx = new_idx
where new_idx <> new_idx;
This should be the fastest method for solving this problem.

SQL group by and where on each group

I have a table with columns like sourceId (guid), state (1:Deactivated, 2:Activated, 3:Dead), modifiedDate.
I am writing a query to group by sourceId and see if ALL the records in a group have the state as 2 (activated) and also get the MAX of modifiedDate of the rows which have state as 2 (activated) in each group.
result table should be something like sourceId, IsAllActivated, MaxModifiedForActivatedRecords.
I tried a lot of options like Partition By, Cross over etc. which are giving me either one of the column and not both. Options which have self joins were costly, so looking for any other efficient way of forming the query.
Data :
SourceId | State | modifiedDate
s1 | 1 | 01/01
s1 | 2 | 01/02
s2 | 3 | 02/03
s2 | 3 | 03/03
s1 | 3 | 10/10
Ouput:
sourceId | IsAllActivated | MaxModifiedForActivatedRecords
s1 | 0 | 02/03
s2 | 1 | 03/03
What i had tried :
SELECT
[SourceID]
,CASE
WHEN COUNT(DISTINCT State) = 1 AND
SUM(DISTINCT State) = 3
THEN 1
ELSE 0
END AS IsAllActivated
FROM ThreadActivation
GROUP BY SourceID
SELECT
[SourceID]
,MAX(modifiedDate) AS MaxModifiedForActivatedRecords
FROM ThreadActivation
GROUP BY SourceID
HAVING State = 3
I am able to get them separately, but not together in a single query.
I tried ranking with row number :
WITH ThreadActivationTransaction AS (
select
*
,ROW_NUMBER() over(PARTITION BY SourceId order by modifiedDate desc) AS rk
from ThreadActivation)
select
[sourceID]
,CASE
WHEN COUNT(DISTINCT State) = 1 AND SUM(DISTINCT State) = 3
THEN 1
ELSE 0
END AS IsAllActivated
,[SourceId]
from ThreadActivation s
GROUP by SourceId --where s.rk =1
All these were not giving me a break through.
You can do this with aggregation and case:
select sourceId,
(case when max(state) = min(state) and max(state) = 2
then 1 else 0
end) as IsAllActivated,
max(case when state = 2 then modifiedDate end) as MaxModifiedForActivatedRecords
from t
group by sourceId;
This assumes that state is not NULL. The logic is only slightly more complicated if that is possible.

Grouping SQL Results based on order

I have table with data something like this:
ID | RowNumber | Data
------------------------------
1 | 1 | Data
2 | 2 | Data
3 | 3 | Data
4 | 1 | Data
5 | 2 | Data
6 | 1 | Data
7 | 2 | Data
8 | 3 | Data
9 | 4 | Data
I want to group each set of RowNumbers So that my result is something like this:
ID | RowNumber | Group | Data
--------------------------------------
1 | 1 | a | Data
2 | 2 | a | Data
3 | 3 | a | Data
4 | 1 | b | Data
5 | 2 | b | Data
6 | 1 | c | Data
7 | 2 | c | Data
8 | 3 | c | Data
9 | 4 | c | Data
The only way I know where each group starts and stops is when the RowNumber starts over. How can I accomplish this? It also needs to be fairly efficient since the table I need to do this on has 52 Million Rows.
Additional Info
ID is truly sequential, but RowNumber may not be. I think RowNumber will always begin with 1 but for example the RowNumbers for group1 could be "1,1,2,2,3,4" and for group2 they could be "1,2,4,6", etc.
For the clarified requirements in the comments
The rownumbers for group1 could be "1,1,2,2,3,4" and for group2 they
could be "1,2,4,6" ... a higher number followed by a lower would be a
new group.
A SQL Server 2012 solution could be as follows.
Use LAG to access the previous row and set a flag to 1 if that row is the start of a new group or 0 otherwise.
Calculate a running sum of these flags to use as the grouping value.
Code
WITH T1 AS
(
SELECT *,
LAG(RowNumber) OVER (ORDER BY ID) AS PrevRowNumber
FROM YourTable
), T2 AS
(
SELECT *,
IIF(PrevRowNumber IS NULL OR PrevRowNumber > RowNumber, 1, 0) AS NewGroup
FROM T1
)
SELECT ID,
RowNumber,
Data,
SUM(NewGroup) OVER (ORDER BY ID
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS Grp
FROM T2
SQL Fiddle
Assuming ID is the clustered index the plan for this has one scan against YourTable and avoids any sort operations.
If the ids are truly sequential, you can do:
select t.*,
(id - rowNumber) as grp
from t
Also you can use recursive CTE
;WITH cte AS
(
SELECT ID, RowNumber, Data, 1 AS [Group]
FROM dbo.test1
WHERE ID = 1
UNION ALL
SELECT t.ID, t.RowNumber, t.Data,
CASE WHEN t.RowNumber != 1 THEN c.[Group] ELSE c.[Group] + 1 END
FROM dbo.test1 t JOIN cte c ON t.ID = c.ID + 1
)
SELECT *
FROM cte
Demo on SQLFiddle
How about:
select ID, RowNumber, Data, dense_rank() over (order by grp) as Grp
from (
select *, (select min(ID) from [Your Table] where ID > t.ID and RowNumber = 1) as grp
from [Your Table] t
) t
order by ID
This should work on SQL 2005. You could also use rank() instead if you don't care about consecutive numbers.