I have a table event with 3 columns and would like to select two consecutive rows of the same case id with certain criteria (rules) as follows. I have about 5k+ of different case id to select based on the criteria given and below is just example of 2 case id. I have part of code to try, however, got stuck because i dont know how to select both rows if conditions is met.
Rules:
If D1 follows by D3 THEN Select both rows
IfElse D1 follows by D4 THEN Select both rows
IfElse D2 follows by D1 THEN Select both rows
IfElse D2 follows by D3 THEN Select both rows
IfElse D3 follows by D2 THEN Select both rows
IfElse D3 follows by D1 THEN Select both rows
Else Do not select
Table event:
caseID D Timestamp
-----------------------------------
1 D1 T1
1 D2 T2
1 D3 T3
1 D1 T4
1 D3 T5
1 D2 T6
1 D1 T7
1 D2 T8
1 D4 T9
2 D2 T1
2 D1 T2
2 D2 T3
2 D3 T4
2 D1 T5
2 D4 T6
2 D5 T7
Expected output:
caseID D Timestamp
----------------------------------
1 D2 T2
1 D3 T3
1 D1 T4
1 D3 T5
1 D2 T6
1 D1 T7
2 D2 T1
2 D1 T2
2 D2 T3
2 D3 T4
2 D1 T5
2 D4 T6
Code I might try:
SELECT caseID, D, Timestamp
FROM event e1
INNER JOIN event e2 ON e1.caseID = e2.caseID
WHERE
CASE #D
WHEN e1.D = D1 AND e2.D = D3 THEN ?
Here's one option using lead and lag with case:
select caseid, d, timestamp
from (
select *, lead(d) over (partition by caseId order by timestamp) lead,
lag(d) over (partition by caseId order by timestamp) lag
from event
) t
where 1 = case
when d = 'D1' and lead in ('D3','D4') then 1
when d = 'D2' and lead in ('D1','D3') then 1
when d = 'D3' and lead in ('D2','D1') then 1
when d = 'D1' and lag in ('D2', 'D3') then 1
when d = 'D2' and lag in ('D3') then 1
when d = 'D3' and lag in ('D2','D1') then 1
when d = 'D4' and lag in ('D1') then 1
else 0
end
order by caseid, timestamp
Online Demo
It could be consolidate, but wanted to be as explicit as possible to define your criteria.
Due to SQL-server 2008 didn't support Lag and Lead you can write a subquery to make it.
SELECT caseID,
D,
Timestamp
FROM (
select *,(
select TOP 1 D
FROM T tt
WHERE t1.caseID = tt.caseID
and t1.Timestamp < tt.Timestamp
ORDER BY tt.Timestamp
) nextD,(
select TOP 1 D
FROM T tt
WHERE t1.caseID = tt.caseID
and t1.Timestamp > tt.Timestamp
ORDER BY tt.Timestamp desc
) pervD
from T t1
) t1
WHERE (CASE WHEN d = 'D1' and nextD in ('D3','D4') OR
d = 'D2' and nextD in ('D1','D3') OR
d = 'D3' and nextD in ('D2','D1') OR
d = 'D1' and pervD in ('D2', 'D3') OR
d = 'D2' and pervD in ('D3') OR
d = 'D3' and pervD in ('D2','D1') OR
d = 'D4' and pervD in ('D1')
THEN D END) IS NOT NULL
sqlfiddle
I have a table like below:
Region Country Manufacturer Brand Period Spend
R1 C1 M1 B1 2016 5
R1 C1 M1 B1 2017 10
R1 C1 M1 B1 2017 20
R1 C1 M1 B2 2016 15
R1 C1 M1 B3 2017 20
R1 C2 M1 B1 2017 5
R1 C2 M2 B4 2017 25
R1 C2 M2 B5 2017 30
R2 C3 M1 B1 2017 35
R2 C3 M2 B4 2017 40
R2 C3 M2 B5 2017 45
...
I wrote the query below to aggregate them:
SELECT [Region]
,[Country]
,[Manufacturer]
,[Brand]
,Period
,SUM([Spend]) AS [Spend]
FROM myTable
GROUP BY [Region]
,[Country]
,[Manufacturer]
,[Brand]
,[Period]
ORDER BY 1,2,3,4
which yields something like below:
Region Country Manufacturer Brand Period Spend
R1 C1 M1 B1 2016 5
R1 C1 M1 B1 2017 30 -- this row is an aggregate from raw table above
R1 C1 M1 B2 2016 15
R1 C1 M1 B3 2017 20
R1 C2 M1 B1 2017 4 -- aggregated result
R1 C2 M2 B4 2017 25
R1 C2 M2 B5 2017 30
R2 C3 M2 B4 2017 40
R2 C3 M2 B5 2017 45
I'd like to add another column to the above table that shows the DISTINCT COUNT of Brand grouped by Region,Country,Manufacturer and Period. So the final table would become as follow:
Region Country Manufacturer Brand Period Spend UniqBrandCount
R1 C1 M1 B1 2016 5 2 -- two brands by R1, C1, M1 in 2016
R1 C1 M1 B1 2017 30 1
R1 C1 M1 B2 2016 15 2 -- same as first row's result
R1 C1 M1 B3 2017 20 1
R1 C2 M1 B1 2017 4 1
R1 C2 M2 B4 2017 25 2
R1 C2 M2 B5 2017 30 2
R2 C3 M2 B4 2017 40 2
R2 C3 M2 B5 2017 45 2
I know how to get to final result in three steps.
Run this query (Query #1):
SELECT [Region]
,[Country]
,[Manufacturer]
,[Period]
,COUNT(DISTINCT [Brand]) AS [BrandCount]
INTO Temp1
FROM myTable
GROUP BY [Region]
,[Country]
,[Manufacturer]
,[Period]
Run this query (Query #2)
SELECT [Region]
,[Country]
,[Manufacturer]
,[Brand]
,YEAR([Period]) AS Period
,SUM([Spend]) AS [Spend]
INTO Temp2
FROM myTable
GROUP BY [Region]
,[Country]
,[Manufacturer]
,[Brand]
,[Period]
Then LEFT JOIN Temp2 and Temp1 to bring in [BrandCount] from the latter like below:
SELECT a.*
,b.*
FROM Temp2 AS a
LEFT JOIN Temp1 AS b ON a.[Region] = b.[Region]
AND a.[Country] = b.[Country]
AND a.[Advertiser] = b.[Advertiser]
AND a.[Period] = b.[Period]
I'm pretty sure there is a more efficient way to do this, is there? Thank you in advance for your suggestions/answers!
Borrowing heavily from this question: https://dba.stackexchange.com/questions/89031/using-distinct-in-window-function-with-over
Count Distinct doesn't work, so dense_rank is required. Ranking the brands in forward and then reverse order, and then subtracting 1 gives the distinct count.
Your sum function can also be rewritten using PARTITION BY logic. This way you can use different grouping levels for each aggregation:
SELECT
[Region]
,[Country]
,[Manufacturer]
,[Brand]
,[Period]
,dense_rank() OVER
(PARTITION BY
[Region]
,[Country]
,[Manufacturer]
,[Period] Order by Brand)
+ dense_rank() OVER
(PARTITION BY
[Region]
,[Country]
,[Manufacturer]
,[Period] Order by Brand Desc)
- 1
AS [BrandCount]
,SUM([Spend]) OVER
(PARTITION BY
[Region]
,[Country]
,[Manufacturer]
,[Brand]
,[Period]) as [Spend]
from
myTable
ORDER BY 1,2,3,4
You may then need to reduce the number of rows in your output, as this syntax gives the same number of rows as myTable, but with the aggregation totals appearing on each row they apply to:
R1 C1 M1 B1 2016 2 5
R1 C1 M1 B1 2017 2 30 --dup1
R1 C1 M1 B1 2017 2 30 --dup1
R1 C1 M1 B2 2016 2 15
R1 C1 M1 B3 2017 2 20
R1 C2 M1 B1 2017 1 5
R1 C2 M2 B4 2017 2 25
R1 C2 M2 B5 2017 2 30
R2 C3 M1 B1 2017 1 35
R2 C3 M2 B4 2017 2 40
R2 C3 M2 B5 2017 2 45
Selecting distinct rows from this output gives you what you need.
How the dense_rank trick works
Consider this data:
Col1 Col2
B 1
B 1
B 3
B 5
B 7
B 9
dense_rank() ranks data according to the number of distinct items before the current one, plus 1. So:
1->1, 3->2, 5->3, 7->4, 9->5.
In reverse order (using desc) this yields the reverse pattern:
1->5, 3->4, 5->3, 7->2, 9->1:
Adding these ranks together gives the same value:
1+5 = 2+4 = 3+3 = 4+2 = 5+1 = 6
The wording is helpful here,
(number of distinct items before + 1) + (number of distinct items after + 1)
= number of distinct OTHER items before AND after + 2
= Total number of distinct items + 1
So to get the total number of distinct items, add the ascending and descending dense_ranks together and subtract 1.
The tag to your question;
window-functions
suggests you have a pretty good idea.
For DISTINCT COUNT of Brand grouped by Region,Country,Manufacturer and Period: you may write:
Select Region
,Country
,Manufacturer
,Brand
,Period
,Spend
,DENSE_RANK() Over (Partition By Region, Country, Manufacturer, Period Order By Brand asc)
+ DENSE_RANK() Over (Partition By Region, Country, Manufacturer, Period Order By Brand desc)
-1 UniqBrandCount
From myTable T1
Order By 1,2,3,4
The double dense_rank idea means that you need two sorts (assuming no index exists that provides sort order). Assuming no NULL brands (as that idea does) you can use a single dense_rank and a windowed MAX as below (demo)
WITH T1
AS (SELECT *,
DENSE_RANK() OVER (PARTITION BY [Region], [Country], [Manufacturer], [Period] ORDER BY Brand) AS [dr]
FROM myTable),
T2
AS (SELECT *,
MAX([dr]) OVER (PARTITION BY [Region], [Country], [Manufacturer], [Period]) AS UniqBrandCount
FROM T1)
SELECT [Region],
[Country],
[Manufacturer],
[Brand],
Period,
SUM([Spend]) AS [Spend],
MAX(UniqBrandCount) AS UniqBrandCount
FROM T2
GROUP BY [Region],
[Country],
[Manufacturer],
[Brand],
[Period]
ORDER BY [Region],
[Country],
[Manufacturer],
[Period],
Brand
The above has some inevitable spooling (it isn't possible to do this in a 100% streaming manner) but a single sort.
Strangely the final order by clause is needed to keep the number of sorts down to one (or zero if a suitable index exists).
I created a function abcd_insert() which inserts data into table abcd, which has 8 columns. The code inside the function looks similar to below :
BEGIN
INSERT INTO abcd
VALUES
(
x ,
y ,
select sum(count) from (select count(*) from a where a1 = x and a2 = y and a3 = 1 union all select count(*) from b where b1 = x and b2 = y and b3 = 1 ) as n1,
select sum(count) from (select count(*) from a where a1 = x and a2 = y and a2 = 2 union all select count(*) from b where b1 = x and b2 = y and b3 = 2 ) as n2 ,
select sum(count) from (select count(*) from a where a1 = x and a2 = y and a2 = 3 union all select count(*) from b where b1 = x and b2 = y and b3 = 3 ) as n3 ,
select sum(count) from (select count(*) from a where a1 = x and a2 = y and a2 = 4 union all select count(*) from b where b1 = x and b2 = y and b3 = 4 ) as n4 ,
select sum(count) from (select count(*) from a where a1 = x and a2 = y and a2 = 5 union all select count(*) from b where b1 = x and b2 = y and b3 = 5 ) as n5 ,
SELECT sum(q1) from
(SELECT CASE WHEN COUNT(1) > 0 THEN 1 ELSE 0 END as q1 FROM p1 where p11 = x and p12 = y union all
SELECT CASE WHEN COUNT(1) > 0 THEN 1 ELSE 0 END as q1 FROM p2 where p21 = x and p22 = y ) as q1
);
END;
'x' and 'y' are my input parameters whose values will be passed to the function abcd_insert(). a,b,p1 and p2 are tables within the same schema.
When I pass 'x' and 'y' to the function at run time, it throws error.
Can someone please tell me what I am doing wrong here.
I think you'd better to specify the column names in your insert statement.
Insert into abcd ("column1",...,"column8") values ...
And please post the error, so that others can help.
with your sample code, you need brackets around queries, so you would use their result, eg:
t=# create table t (i int, e int);
CREATE TABLE
t=# create or replace function f(x int) returns void as $$
begin
insert into t values (x, (select 1 where x > 0));
end;
$$ language plpgsql;
CREATE FUNCTION
t=# select f(1);
f
---
(1 row)
t=# select * from t;
i | e
---+---
1 | 1
(1 row)
I have a table like:
1 a a1 a2
1 b b1 b2
1 c c1 c2
1 d d1 d2
2 a a1 a2
2 b b1 b2
2 c c1 c2
3........
3........
........
........
n x x1 x2
n y y1 y2
n z z1 z2
From this, I want to get for each number(1,2,3,4....n) some specified number(say 2) of rows.
Result:
1 a a1 a2
1 b b1 b2
2 a a1 a2
2 b b1 b2
.........
.........
n x x1 x2
n y y1 y2
I am trying to do group by and string_agg(). But I can't limit it to a specified number.
How can I go about it?
This can be done using a window function:
select nr, col1, col2, col3
from (
select nr, col1, col2, col3,
row_number() over (partition by nr order by col1) as rn
from the_table
) t
where rn <= 2;
If you want to influence which rows are returned, you can adjust the order by that defines the ordering of the rows in the window function.
i have one SQL Table in which some dummy data.
i want that dummy data and update that row with column type
my table tbl
ID D1 M1 C1 QTY TYPE
1 D1 M1 C1 1 Y
2 D1 M2 C1 2 Y
3 D1 M3 C1 3 Y
4 D1 M1 C1 1 Y
5 D2 M1 C1 1 Y
6 D2 M2 C1 2 Y
7 D2 M3 C2 3 Y
8 D2 M1 C1 1 Y
9 D2 M2 C1 2 Y
10 D3 M1 C1 1 Y
11 D3 M2 C1 2 Y
12 D3 M3 C1 3 Y
13 D3 M1 C1 1 Y
14 D3 M2 C1 2 Y
15 D3 M3 C1 3 Y
16 D3 M1 C2 1 Y
grouping on Column D1 and M1
I have a N no. of records, now I have to identify group of 3 record and if any record remain then it should be set as "No" else "yes"
Ex:
condition 1: If I have 4 records, then make a group of 3-3 records so remain last 1 record should be set an "no".
condition 2: If I have 5 records, then make a group of 3-3 records so remain last 2 records will be set as "No"
Condition 3: if I have 7 records, then make a group of 3-3 records so remain last 1 record will be set as "No"
my expected answer is as below
ID D1 M1 C1 QTY TYPE
1 D1 M1 C1 1 YES
2 D1 M2 C1 2 YES
3 D1 M3 C1 3 YES
4 D1 M1 C1 1 NO
5 D2 M1 C1 1 YES
6 D2 M2 C1 2 YES
7 D2 M3 C2 3 YES
8 D2 M1 C1 1 NO
9 D2 M2 C1 2 NO
10 D3 M1 C1 1 YES
11 D3 M2 C1 2 YES
12 D3 M3 C1 3 YES
13 D3 M1 C1 1 YES
14 D3 M2 C1 2 YES
15 D3 M3 C1 3 YES
16 D3 M1 C2 1 NO
SQLFIDDLE
Please tell me solution.
Is this what you are looking for?
with toupdate as (
select t.*, row_number() over (partition by d1 order by id) as seqnum,
count(*) over (partition by d1) as cnt
from tbl t
)
update toupdate
set type = (case when seqnum <= 3*(cnt /3) then 'yes' else 'no' end);
You can also run similar logic as a select:
select t.*, (case when seqnum <= 3*(cnt /3) then 'yes' else 'no' end)
from (
select t.*, row_number() over (partition by d1 order by id) as seqnum,
count(*) over (partition by d1) as cnt
from tbl t
) t;