SQL group by and where on each group - sql

I have a table with columns like sourceId (guid), state (1:Deactivated, 2:Activated, 3:Dead), modifiedDate.
I am writing a query to group by sourceId and see if ALL the records in a group have the state as 2 (activated) and also get the MAX of modifiedDate of the rows which have state as 2 (activated) in each group.
result table should be something like sourceId, IsAllActivated, MaxModifiedForActivatedRecords.
I tried a lot of options like Partition By, Cross over etc. which are giving me either one of the column and not both. Options which have self joins were costly, so looking for any other efficient way of forming the query.
Data :
SourceId | State | modifiedDate
s1 | 1 | 01/01
s1 | 2 | 01/02
s2 | 3 | 02/03
s2 | 3 | 03/03
s1 | 3 | 10/10
Ouput:
sourceId | IsAllActivated | MaxModifiedForActivatedRecords
s1 | 0 | 02/03
s2 | 1 | 03/03
What i had tried :
SELECT
[SourceID]
,CASE
WHEN COUNT(DISTINCT State) = 1 AND
SUM(DISTINCT State) = 3
THEN 1
ELSE 0
END AS IsAllActivated
FROM ThreadActivation
GROUP BY SourceID
SELECT
[SourceID]
,MAX(modifiedDate) AS MaxModifiedForActivatedRecords
FROM ThreadActivation
GROUP BY SourceID
HAVING State = 3
I am able to get them separately, but not together in a single query.
I tried ranking with row number :
WITH ThreadActivationTransaction AS (
select
*
,ROW_NUMBER() over(PARTITION BY SourceId order by modifiedDate desc) AS rk
from ThreadActivation)
select
[sourceID]
,CASE
WHEN COUNT(DISTINCT State) = 1 AND SUM(DISTINCT State) = 3
THEN 1
ELSE 0
END AS IsAllActivated
,[SourceId]
from ThreadActivation s
GROUP by SourceId --where s.rk =1
All these were not giving me a break through.

You can do this with aggregation and case:
select sourceId,
(case when max(state) = min(state) and max(state) = 2
then 1 else 0
end) as IsAllActivated,
max(case when state = 2 then modifiedDate end) as MaxModifiedForActivatedRecords
from t
group by sourceId;
This assumes that state is not NULL. The logic is only slightly more complicated if that is possible.

Related

Select N rows in aggregate functions SQL Server

I have a table that looks like this:
+--------+----------+--------+------------+-------+
| ID | CHANNEL | VENDOR | num_PERIOD | SALES |
+--------+----------+--------+------------+-------+
| 000001 | Business | Shop | 1 | 40 |
| 000001 | Business | Shop | 2 | 60 |
| 000001 | Business | Shop | 3 | NULL |
+--------+----------+--------+------------+-------+
With many combinations of ID, CHANNEL and VENDOR, and sales records for each of them over time (num_PERIOD).
The idea is to obtain a new column which returns the number of NULLS in SALES column, but in the first 111 registers according to num_PERIOD column.
I have been trying something like this:
SELECT ID,
CHANNEL,
VENDOR,
sum(CASE
WHEN SALES IS NULL THEN 1
ELSE 0
END) OVER (PARTITION BY ID,
CHANNEL,
VENDOR
ORDER BY num_PERIOD ROWS BETWEEN UNBOUNDED PRECEDING AND 111 FOLLOWING) AS NULL_SALES_SET
FROM TABLE
GROUP BY ID,
CHANNEL,
VENDOR
But I'm not obtaining what I'm looking for.
So to obtain a table simillar to:
+--------+--------------+--------+----------------+
| ID | CHANNEL | VENDOR | NULL_SALES_SET |
+--------+--------------+--------+----------------+
| 000001 | Business | Shop | 1 |
| 000002 | Business | Market | 0 |
| 000002 | Non Business | Shop | 3 |
+--------+--------------+--------+----------------+
The difficulty comes when selecting these first 111 rows per ID, CHANNEL AND VENDOR ordered by num_PERIOD.
Use a CTE (Common Table Expression) with the ROW_NUMBER windowed function and you should be set:
;WITH MyCTE AS
(
SELECT
id,
channel,
vendor,
sales,
ROW_NUMBER() OVER (PARTITION BY id, channel, vendor ORDER BY num_period) AS row_num
FROM
MyTable
)
SELECT
id,
channel,
vendor,
SUM(CASE WHEN sales IS NULL THEN 1 ELSE 0 END) AS null_sales_set
FROM
MyCTE
WHERE
row_num <= 111
GROUP BY
id, channel, vendor
Do you have to use the windowing function?
SELECT ID
, CHANNEL
, VENDOR
, NULL_SALES_SET = SUM(CASE WHEN SALES IS NULL THEN 1 ELSE 0 END)
FROM Table
WHERE num_PERIOD <= 111
GROUP BY ID, CHANNEL, VENDOR
Or are you looking for the first 111 num_PERIOD values allowing for gaps in the num_PERIOD column?
SELECT t.ID
, t.CHANNEL
, t.VENDOR
, NULL_SALES_SET = SUM(CASE WHEN t.SALES IS NULL THEN 1 ELSE 0 END)
FROM Table t
INNER JOIN ( SELECT i.ID
, i.CHANNEL
, i.VENDOR
, i.num_PERIOD
, rowNum = ROW_NUMBER(PARTITION BY i.ID, i.CHANNEL, i.VENDOR ORDER BY i.num_PERIOD)
FROM Table i ) l
ON t.ID = l.ID
AND t.CHANNEL = l.CHANNEL
AND t.VENDOR = l.VENDOR
AND t.num_PERIOD = l.num_PERIOD
WHERE l.rowNum <= 111
GROUP BY ID, CHANNEL, VENDOR
Edit: Not sure how I overlooked it, but it is necessary to JOIN on the num_PERIOD column.
Edit: Add the number of distinct num_PERIOD per ID, Channel, Vendor without affecting the NULL_SALES_SET
SELECT t.ID
, t.CHANNEL
, t.VENDOR
-- Counts the NULL Sales when the num_PERIOD is in the
-- first 111 num_PERIODs
, NULL_SALES_SET = SUM(CASE WHEN l.rowNum IS NOT NULL AND t.SALES IS NULL
THEN 1
ELSE 0 END)
-- Counts the distinct num_PERIOD values
, PERIOD_COUNT = COUNT(DISTINCT t.num_PERIOD)
FROM Table t
LEFT OUTER JOIN ( SELECT i.ID
, i.CHANNEL
, i.VENDOR
, i.num_PERIOD
, rowNum = ROW_NUMBER(PARTITION BY i.ID,
i.CHANNEL,
i.VENDOR
ORDER BY i.num_PERIOD)
FROM Table i ) l
ON t.ID = l.ID
AND t.CHANNEL = l.CHANNEL
AND t.VENDOR = l.VENDOR
AND t.num_PERIOD = l.num_PERIOD
AND l.rowNum <= 111
GROUP BY ID, CHANNEL, VENDOR

How can i check the order of column values(by date) for every unique id?

I have this table, Activity:
| ID | Date of activity | activity |
|----|---------------------|----------|
| 1 | 2016-05-01T13:45:03 | a |
| 1 | 2016-05-02T13:45:03 | b |
| 1 | 2016-05-03T13:45:03 | a |
| 1 | 2016-05-04T13:45:03 | b |
| 2 | 2016-05-01T13:45:03 | b |
| 2 | 2016-05-02T13:45:03 | b |
and this table:
| id | Right order |
|----|-------------|
| 1 | yes |
| 2 | no |
How can I check for every ID if the order of the activities is sumiliar to this order for example ?
a b a b a b ..
of course i'll check according to activity date
In SQL Server 2012+ you could use common table expression with lag(), and then the min() of a case expression that follows your logic like so:
;with cte as (
select *
, prev_activity = lag(activity) over (partition by id order by date_of_activity)
from t
)
select id
, right_order = min(case
when activity = 'a' and isnull(prev_activity,'b')<>'b' then 'no'
when activity = 'b' and isnull(prev_activity,'b')<>'a' then 'no'
else 'yes'
end)
from cte
group by id
rextester demo: http://rextester.com/NQQF78056
returns:
+----+-------------+
| id | right_order |
+----+-------------+
| 1 | yes |
| 2 | no |
+----+-------------+
Prior to SQL Server 2012 you can use outer apply() to get the previous activity instead of lag() like so:
select id
, right_order = min(case
when activity = 'a' and isnull(prev_activity,'b')<>'b' then 'no'
when activity = 'b' and isnull(prev_activity,'b')<>'a' then 'no'
else 'yes'
end)
from t
outer apply (
select top 1 prev_activity = i.activity
from t as i
where i.id = t.id
and i.date_of_activity < t.date_of_activity
order by i.date_of_activity desc
) x
group by id
EDITED - Allows for variable number of Patterns per ID
Perhaps another approach
Example
Declare #Pat varchar(max)='a b'
Declare #Cnt int = 2
Select ID
,RightOrder = case when rtrim(replicate(#Pat+' ',Hits/#Cnt)) = (Select Stuff((Select ' ' +activity From t Where id=A.id order by date_of_activity For XML Path ('')),1,1,'') ) then 'Yes' else 'No' end
From (Select ID,hits=count(*) from t group by id) A
Returns
ID RightOrder
1 Yes
2 No
select id,
case when sum(flag)=0 and cnt_per_id%2=0
and max(case when rnum=1 then activity end) = 'a'
and max(case when rnum=2 then activity end) = 'b'
and min_activity = 'a' and max_activity = 'b'
then 'yes' else 'no' end as RightOrder
from (select t.*
,row_number() over(partition by id order by activitydate) as rnum
,count(*) over(partition by id) as cnt_per_id
,min(activity) over(partition by id) as min_activity
,max(activity) over(partition by id) as max_activity
,case when lag(activity) over(partition by id order by activitydate)=activity then 1 else 0 end as flag
from tbl t
) t
group by id,cnt_per_id,max_activity,min_activity
Based on the explanation the following logic has to be implemented for rightorder.
Check if the number of rows per id are even (Remove this condition if there can be an odd number of rows like a,b,a or a,b,a,b,a and so on)
First row contains a and second b, min activity is a and max activity is b for an id.
Sum of flags (set using lag) should be 0

SQL Query using Partition By

I have following table name JobTitle
JobID LanaguageID
-----------------
1 1
1 2
1 3
2 1
2 2
3 4
4 5
5 2
I am selecting all records from table except duplicate JobID's for which count > 1. I am selecting only one record/first row from the duplicate JobID's.
Now I am passing LanguageID as paramter to stored procedure and I want to select duplicate JobID for that languageID along with the other records Also.
If I have passed languageID as 1 then output should come as follows
JobID LanaguageID
-----------------
1 1
2 1
3 4
4 5
5 2
I have tried using following query.
with CTE_RN as
(
SELECT ROW_NUMBER() OVER(PARTITION BY JobTitle.JobID ORDER BY JobTitle.JobTitle) AS RN
FROM JobTitle
INNER JOIN JobTitle_Lang
ON JobTitle.JobTitleID = JobTitle_Lang.JobTitleID
)
But I am unable to use WHERE clause in the above query.
Is any different approch should be followed. Or else how can i modify the query to get the desired output
with CTE_RN as
(
SELECT
JobID, LanaguageID,
ROW_NUMBER() OVER(PARTITION BY JobTitle.JobID ORDER BY JobTitle.JobTitle) AS RN
FROM JobTitle
INNER JOIN JobTitle_Lang ON JobTitle.JobTitleID = JobTitle_Lang.JobTitleID
)
select
from CTE_RN
where RN = 1 or LanguageID = #LanguageID
update
simplified a bit (join removed), but you'll get the idea:
declare #LanguageID int = 2
;with cte_rn as
(
select
JobID, LanguageID,
row_number() over(
partition by JobTitle.JobID
order by
case when LanguageID = #LanguageID then 0 else 1 end,
LanguageID
) as rn
from JobTitle
)
select *
from cte_rn
where rn = 1
sql fiddle demo
SELECT b.[JobID], b.[LanaguageID]
FROM
(SELECT a.[JobID], a.[LanaguageID],
ROW_NUMBER() OVER(PARTITION BY a.[JobID] ORDER BY a.[LanaguageID]) AS [row]
FROM [JobTitle] a) b
WHERE b.[row] = 1
Result
| JOBID | LANAGUAGEID |
--------|-------------|
| 1 | 1 |
| 2 | 1 |
| 3 | 4 |
| 4 | 5 |
| 5 | 2 |
See a demo

Grouping SQL Results based on order

I have table with data something like this:
ID | RowNumber | Data
------------------------------
1 | 1 | Data
2 | 2 | Data
3 | 3 | Data
4 | 1 | Data
5 | 2 | Data
6 | 1 | Data
7 | 2 | Data
8 | 3 | Data
9 | 4 | Data
I want to group each set of RowNumbers So that my result is something like this:
ID | RowNumber | Group | Data
--------------------------------------
1 | 1 | a | Data
2 | 2 | a | Data
3 | 3 | a | Data
4 | 1 | b | Data
5 | 2 | b | Data
6 | 1 | c | Data
7 | 2 | c | Data
8 | 3 | c | Data
9 | 4 | c | Data
The only way I know where each group starts and stops is when the RowNumber starts over. How can I accomplish this? It also needs to be fairly efficient since the table I need to do this on has 52 Million Rows.
Additional Info
ID is truly sequential, but RowNumber may not be. I think RowNumber will always begin with 1 but for example the RowNumbers for group1 could be "1,1,2,2,3,4" and for group2 they could be "1,2,4,6", etc.
For the clarified requirements in the comments
The rownumbers for group1 could be "1,1,2,2,3,4" and for group2 they
could be "1,2,4,6" ... a higher number followed by a lower would be a
new group.
A SQL Server 2012 solution could be as follows.
Use LAG to access the previous row and set a flag to 1 if that row is the start of a new group or 0 otherwise.
Calculate a running sum of these flags to use as the grouping value.
Code
WITH T1 AS
(
SELECT *,
LAG(RowNumber) OVER (ORDER BY ID) AS PrevRowNumber
FROM YourTable
), T2 AS
(
SELECT *,
IIF(PrevRowNumber IS NULL OR PrevRowNumber > RowNumber, 1, 0) AS NewGroup
FROM T1
)
SELECT ID,
RowNumber,
Data,
SUM(NewGroup) OVER (ORDER BY ID
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS Grp
FROM T2
SQL Fiddle
Assuming ID is the clustered index the plan for this has one scan against YourTable and avoids any sort operations.
If the ids are truly sequential, you can do:
select t.*,
(id - rowNumber) as grp
from t
Also you can use recursive CTE
;WITH cte AS
(
SELECT ID, RowNumber, Data, 1 AS [Group]
FROM dbo.test1
WHERE ID = 1
UNION ALL
SELECT t.ID, t.RowNumber, t.Data,
CASE WHEN t.RowNumber != 1 THEN c.[Group] ELSE c.[Group] + 1 END
FROM dbo.test1 t JOIN cte c ON t.ID = c.ID + 1
)
SELECT *
FROM cte
Demo on SQLFiddle
How about:
select ID, RowNumber, Data, dense_rank() over (order by grp) as Grp
from (
select *, (select min(ID) from [Your Table] where ID > t.ID and RowNumber = 1) as grp
from [Your Table] t
) t
order by ID
This should work on SQL 2005. You could also use rank() instead if you don't care about consecutive numbers.

Conditional Aggregate Query

I have a table like this:
OrderID | PhaseID | Timestamp
1 | 1 | 1/1
1 | 2 | 1/2
1 | 3 | 1/3
1 | 2 | 1/4
1 | 4 | 1/5
I'm trying to get a query to return the most recent timestamp for each orderphase combination without being followed by a lesser phaseid. Something like this:
OrderID | PhaseID | MaxTimestampWithoutBeingFollowedByLesserPhaseID
1 | 1 | 1/1
1 | 2 | 1/4
1 | 3 | NULL
1 | 4 | 1/5
I keep running around in circles and coming up with this problem of a conditional aggregate query.
Can anybody figure out the query or give me some pointers?
With a as (
Select OrderID, PhaseID, MaxTimestamp=max([Timestamp])
From orderphase op
Where not exists(select 1 from orderphase where PhaseID < op.PhaseID
and [Timstamp]> op.[Timestamp])
Group by OrderID, PhaseID
)
Select distinct o.Orderid, o.PhaseID, MaxTimestamp=a.MaxTimestamp
From orderphase o
Left join a on a.OrderID = o.OrderID and a.PhaseID=o.PhaseID
EDIT ref a.MaxTimestamp
Rank phases by time stamps for every order.
Join every row with its successor based on the rank.
Mark the rows where PhaseID is followed by a lesser PhaseID.
Aggregate the last result set picking max time stamps conditionally using MAX(CASE ...) to omit rows marked as ones being followed by lesser PhaseIDs.
Here's a sample implementation:
;
WITH ranked AS (
SELECT
*,
rnk = ROW_NUMBER() OVER (PARTITION BY OrderID ORDER BY [Timestamp])
FROM atable
),
marked AS (
SELECT
r1.OrderID,
r1.PhaseID,
r1.[Timestamp],
IsFollowedByLesserPhaseID = CASE WHEN r2.PhaseID IS NULL THEN 0 ELSE 1 END
FROM ranked r1
LEFT JOIN ranked r2 ON r1.OrderID = r2.OrderID
AND r1.rnk = r2.rnk - 1
AND r1.PhaseID > r2.PhaseID
)
SELECT
OrderID,
PhaseID,
MaxTimestampWithoutBeingFollowedByLesserPhaseID = MAX(
CASE IsFollowedByLesserPhaseID WHEN 0 THEN [Timestamp] END
)
FROM marked
GROUP BY
OrderID,
PhaseID