aggregation according to different conditions on same column

aggregation according to different conditions on same column - sql

I have a table #tbl like below, i need to write a query like if there are more than 3 records availble
for particular cid then avg(val of particular cid ) for particular cid should be dispalyed against each id and if there are less than
3 records availble for particular cid then avg(val of all records availble).
Please suggest.
declare #tbl table(id int, cid int, val float )
insert into #tbl
values(1,100,20),(2,100,30),(3,100,25),(4,100,31),(5,100,50),
(6,200,30),(7,200,30),(8,300,90)

Your description is not clear, but I believe you need windowed functions:
WITH cte AS (
SELECT *, COUNT(*) OVER(PARTITION BY cid) AS cnt
FROM #tbl
)
SELECT id, (SELECT AVG(val) FROM cte) AS Av
FROM cte
WHERE cnt <=3
UNION ALL
SELECT id, AVG(val) OVER(PARTITION BY cid) AS Av
FROM cte
WHERE cnt > 3
ORDER BY id;
DBFiddle Demo
EDIT:
SELECT id,
CASE WHEN COUNT(*) OVER(PARTITION BY cid) <= 3 THEN AVG(val) OVER()
ELSE AVG(val) OVER(PARTITION BY cid)
END
FROM #tbl
ORDER BY id;
DBFiddle Demo2

You can try with the following. First calculate the average for each Cid depending in it's number of occurences, then join each Cid with the Id to display all table.
;WITH CidAverages AS
(
SELECT
T.cid,
Average = CASE
WHEN COUNT(1) >= 3 THEN AVG(T.val)
ELSE (SELECT AVG(Val) FROM #tbl) END
FROM
#tbl AS T
GROUP BY
T.cid
)
SELECT
T.*,
C.Average
FROM
#tbl AS T
INNER JOIN CidAverages AS C ON T.cid = C.cid

Given the clarifications in comments, I am thinking this is the intention
declare #tbl table(id int, cid int, val float )
insert into #tbl
values(1,100,20),(2,100,30),(3,100,25),(4,100,31),(5,100,50),
(6,200,30),(7,200,30),(8,300,90);
select distinct
cid
, case
when count(*) over (partition by cid) > 3 then avg(val) over (partition by cid)
else avg (val) over (partition by 1)
end as avg
from #tbl;
http://dbfiddle.uk/?rdbms=sqlserver_2017&fiddle=fdf4c4457220ec64132de7452a034976
cid avg
100 31.2
200 38.25
300 38.25
There are a number of aspects of a query like this that when run at scale though are going to be pretty bad on the query plan, I'd want to test this at a larger scale and tune before using.
The description was not clear on what happened if it was exactly 3, it mentions 'more than 3' and 'less than 3' - within this code the 'more than' was used to determine which category it was in, and less than interpreted to mean 'less than or equal to 3'

Related

Count number of cases where visitor rank is higher on one page then on another

I want to count number fullvisitorID where rank in /page_y is higher then rank in page_x. So in this case result would be 1, only 111
fullvisitorID
rank
page
111
1
/page_x
111
2
/page_y
222
1
/page_x
222
2
/page_x
333
2
/page_x
333
1
/page_y

Consider below approach
select count(*) from (
select distinct fullvisitorID
from your_table
qualify max(if(page='/page_y',rank,null)) over win > max(if(page='/page_x',rank,null)) over win
window win as (partition by fullvisitorID)
)

SELECT COUNTIF(page = '/page_y') cnt FROM (
SELECT * FROM sample_table WHERE page IN ('/page_x', '/page_y')
QUALIFY ROW_NUMBER() OVER (PARTITION BY fullvisitorID ORDER BY rank DESC) = 1
);

for count you can use COUNT and GROUP BY
SELECT fullvisitorID, COUNT(fullvisitorID), Page FROM table t1
WHERE rank = (SELECT MAX(t2.rank) FROM table t2 WHERE t2.fullvisitorID = t1.fullvisitorID)
Group By fullvisitorID, Page

You can apply a SELF JOIN between the two tables, by matching on the "fullvisitorID" field, then force
the first table to have "page_y" values
the second table to have "page_x" values
rank of the first table to have higher rank of the second table
SELECT *
FROM tab t1
INNER JOIN tab t2
ON t1.fullvisitorID = t2.fullvisitorID
AND t1.page = '/page_y'
AND t2.page = '/page_x'
AND t1.rank > t2.rank

Table separation approach:
DECLARE #t1 TABLE ( fullvisitorID INT, [rank] INTEGER,[page] VARCHAR (max)) --here where page = x
DECLARE #t2 TABLE ( fullvisitorID INT, [rank] INTEGER,[page] VARCHAR (max)) --here where page = y
INSERT INTO #t1 SELECT * FROM #test t WHERE t.[page] LIKE '/page_x'
INSERT INTO #t2 SELECT * FROM #test t WHERE t.[page] LIKE '/page_y'
SELECT COUNT(*) FROM #t1 INNER JOIN #t2 ON [#t1].fullvisitorID = [#t2].fullvisitorID WHERE [#t1].rank < [#t2].rank

How to get minimum 3 records per a group?

I have 3 columns in SalesCart table as follows,
I need to get minimum 3 records per Item as follows,
How to do that?

I guess we can use simply Row_Number() -
declare #testtable TABLE
(
ItemCode NVARCHAR(30),
Customer VARCHAR(10),
Amount INT
)
INSERT INTO #testtable
VALUES
('A-001','A', 25000)
,('A-001','B', 15000)
,('A-001','C', 12000)
,('A-001','D', 12500)
,('A-001','E', 20000)
,('A-002','C', 3000)
,('A-002','X', 2250)
,('A-002','Y', 3750)
,('A-002','D', 3100)
select *
from #testtable
select *
from
(
select *, ROW_number() over (PARTITION BY ItemCode ORDER BY ItemCode ) as Number
from #testtable
) t
where t.Number < 4

You can also try this and you can increase or decrease number based on your requirement dynamically.
DECLARE #top INT;
SET #top = 3;
;WITH grp AS
(
SELECT ItemCode, Customer, Amount,
rn = ROW_NUMBER() OVER
(PARTITION BY ItemCode ORDER BY ItemCode DESC)
FROM itemTable
)
SELECT ItemCode, Customer, Amount
FROM grp
WHERE rn <= #top
ORDER BY ItemCode DESC;

2 rows differences

I would like to get 2 consecutive rows from an SQL table.
One of the columns storing UNIX datestamp and between 2 rows the difference only this value.
For example:
id_int dt_int
1. row 8211721 509794233
2. row 8211722 509794233
I need only those rows where dt_int the same (edited)

Do you want both lines to be shown?
A solution could be this:
with foo as
(
select
*
from (values (8211721),(8211722),(8211728),(8211740),(8211741)) a(id_int)
)
select
id_int
from
(
select
id_int
,id_int-isnull(lag(id_int,1) over (order by id_int) ,id_int-6) prev
,isnull(lead(id_int,1) over (order by id_int) ,id_int+6)-id_int nxt
from foo
) a
where prev<=5 or nxt<=5
We use lead and lag, to find the differences between rows, and keep the rows where there is less than or equal to 5 for the row before or after.
If you use 2008r2, then lag and lead are not available. You could use rownumber in stead:
with foo as
(
select
*
from (values (8211721),(8211722),(8211728),(8211740),(8211741)) a(id_int)
)
, rownums as
(
select
id_int
,row_number() over (order by id_int) rn
from foo
)
select
id_int
from
(
select
cur.id_int
,cur.id_int-prev.id_int prev
,nxt.id_int-cur.id_int nxt
from rownums cur
left join rownums prev
on cur.rn-1=prev.rn
left join rownums nxt
on cur.rn+1=nxt.rn
) a
where isnull(prev,6)<=5 or isnull(nxt,6)<=5

Assuming:
lead() analytical function available.
ID_INT is what we need to sort by to determine table order...
you may need to partition by some value lead(ID_int) over(partition by SomeKeysuchasOrderNumber order by ID_int asc) so that orders and dates don't get mixed together.
.
WITH CTE AS (
SELECT A.*
, lead(ID_int) over ([missing partition info] ORDER BY id_Int asc) - id_int as ID_INT_DIFF
FROM Table A)
SELECT *
FROM CTE
WHERE ID_INT_DIFF < 5;

You can try it. This version works on SQL Server 2000 and above. Today I don not a more recent SQL Server to write on.
declare #t table (id_int int, dt_int int)
INSERT #T SELECT 8211721 , 509794233
INSERT #T SELECT 8211722 , 509794233
INSERT #T SELECT 8211723 , 509794235
INSERT #T SELECT 8211724 , 509794236
INSERT #T SELECT 8211729 , 509794237
INSERT #T SELECT 8211731 , 509794238
;with cte_t as
(SELECT
ROW_NUMBER() OVER (ORDER BY id_int) id
,id_int
,dt_int
FROM #t),
cte_diff as
( SELECT
id_int
,dt_int
,(SELECT TOP 1 dt_int FROM cte_t b WHERE a.id < b.id) dt_int1
,dt_int - (SELECT TOP 1 dt_int FROM cte_t b WHERE a.id < b.id) Difference
FROM cte_t a
)
SELECT DISTINCT id_int , dt_int FROM #t a
WHERE
EXISTS(SELECT 1 FROM cte_diff b where b.Difference =0 and a.dt_int = b.dt_int)

Conditional Selection of Rows Using TSQL SQL Server (2008 R2)

I've been staring at this for hours and hours and can't come up with an "elegant" set-based way of getting the result set I need...
Here's my sample data (my real data could be 1,000,000+ rows)...
DECLARE #t AS TABLE (ID int,ID1 nvarchar(15),[DATE] date,PERIOD int,[TYPE] nchar(1));
INSERT INTO #t (ID,ID1,[DATE],PERIOD,[TYPE])
VALUES
(1,N'NUM1','2016-01-01',1,N'B'),
(2,N'NUM1','2016-01-01',2,N'A'),
(3,N'NUM1','2016-01-01',3,N'A'),
(4,N'NUM1','2016-01-01',4,N'B'),
(5,N'NUM1','2016-01-01',4,N'A'),
(6,N'NUM1','2016-01-01',5,N'A'),
(7,N'NUM1','2016-01-02',1,N'A'),
(8,N'NUM1','2016-01-02',2,N'A'),
(9,N'NUM1','2016-01-02',3,N'A'),
(10,N'NUM1','2016-01-02',4,N'A'),
(11,N'NUM1','2016-01-02',5,N'A'),
(12,N'NUM2','2016-01-01',1,N'A'),
(13,N'NUM2','2016-01-01',1,N'B'),
(14,N'NUM2','2016-01-01',2,N'A'),
(15,N'NUM2','2016-01-01',3,N'A'),
(16,N'NUM2','2016-01-01',4,N'B'),
(17,N'NUM2','2016-01-01',4,N'A'),
(18,N'NUM2','2016-01-01',5,N'A'),
(19,N'NUM2','2016-01-02',1,N'A'),
(20,N'NUM2','2016-01-02',2,N'B'),
(21,N'NUM2','2016-01-02',3,N'A'),
(22,N'NUM2','2016-01-02',4,N'A'),
(23,N'NUM2','2016-01-02',4,N'B'),
(24,N'NUM2','2016-01-02',5,N'A');
Here is the result set I'm trying to get...
1,'NUM1','2016-01-01',1,'B'
2,'NUM1','2016-01-01',2,'A'
3,'NUM1','2016-01-01',3,'A'
5,'NUM1','2016-01-01',4,'A'
6,'NUM1','2016-01-01',5,'A'
7,'NUM1','2016-01-02',1,'A'
8,'NUM1','2016-01-02',2,'A'
9,'NUM1','2016-01-02',3,'A'
10,'NUM1','2016-01-02',4,'A'
11,'NUM1','2016-01-02',5,'A'
12,'NUM2','2016-01-01',1,'A'
14,'NUM2','2016-01-01',2,'A'
15,'NUM2','2016-01-01',3,'A'
17,'NUM2','2016-01-01',4,'A'
18,'NUM2','2016-01-01',5,'A'
19,'NUM2','2016-01-02',1,'A'
20,'NUM2','2016-01-02',2,'B'
21,'NUM2','2016-01-02',3,'A'
22,'NUM2','2016-01-02',4,'A'
24,'NUM2','2016-01-02',5,'A'
Simply put, each day has 5 periods. They can be of type A or B. I need to get the A types. but if there are no A types, I need to get the B types... (Sounds so simple when I write it out.., but my brain will not come up with something suitable)
Pleeeeeease put me out of my misery..

You can use ROW_NUMBER for this:
SELECT ID, ID1, [DATE], PERIOD, [TYPE]
FROM (
SELECT ID, ID1, [DATE], PERIOD, [TYPE],
ROW_NUMBER() OVER (PARTITION BY ID1, [DATE], PERIOD
ORDER BY [TYPE]) AS rn
FROM #t) AS t
WHERE t.rn = 1
Using ORDER BY [TYPE] in the OVER clause of ROW_NUMBER places 'A' records on top of 'B' records. If there are no 'A' records for a given ID1, [DATE], PERIOD then B records are assigned rn = 1.

Your desired outpout contradicts the statement that "I need to get the A types. but if there are no A types, I need to get the B types... ". Every date in the data has one or more 'A' types. By the statement, the output should include only the 'A' types. But if the statement is correct, then this should work:
Select d.[DATE], t.Id, t.ID1, t.PERIOD, t.[TYPE]
from (select distinct [date] from #t) d
left join #t t
on t.[date] = d.[date]
and t.type = case when exists
(select * from #t
where [date] = d.[Date]
and type = 'A') then 'A'
else 'B' End

I've just come up with
SELECT * FROM #t WHERE [TYPE]='A'
UNION ALL
SELECT * FROM #t t1 WHERE [TYPE]='B' AND NOT EXISTS (SELECT ID FROM #t WHERE ID1=t1.ID1 AND [TYPE]='A' AND [DATE]=t1.[DATE] AND Period=t1.Period)
ORDER BY ID;
which give's me what I need...

SQL Cumulative Count

I have table with departments. I need to count how many people are within which dept. This is easily done by
SELECT DEPT,
COUNT(*) as 'Total'
FROM SR
GROUP BY DEPT;
Now I need to also do cumulative count as below:
I have found some SQL to count running total, but not case like this one. Could you provide me some advice in this case, please?

Here's a way to do it with a CTE instead of a cursor:
WITH Base AS
(
SELECT ROW_NUMBER() OVER (ORDER BY [Count] DESC) RowNum,
[Dept],
[Count]
FROM SR
)
SELECT SR.Dept, SR.Count, SUM(SR2.[Count]) Total
FROM Base SR
INNER JOIN Base SR2
ON SR2.RowNum <= SR.RowNum
GROUP BY SR.Dept, SR.Count
ORDER BY SR.[Count] DESC
Note that this is ordering by descending Count like your sample result does. If there's some other column that's not shown that should be used for ordering just replace Count in each of the ORDER BY clauses.
SQL Fiddle Demo

I think you can use some temporary / variable table for this, and use solution from here:
declare #Temp table (rn int identity(1, 1) primary key, dept varchar(128), Total int)
insert into #Temp (dept, Total)
select
dept, count(*) as Total
from SR
group by dept
;with cte as (
select T.dept, T.Total, T.Total as Cumulative, T.rn
from #Temp as T
where T.rn = 1
union all
select T.dept, T.Total, T.Total + C.Cumulative as Cumulative, T.rn
from cte as C
inner join #Temp as T on T.rn = C.rn + 1
)
select C.dept, C.Total, C.Cumulative
from cte as C
option (maxrecursion 0)
sql fiddle demo
There're some other solutions, but this one is fastest for SQL Server 2008, I think.

If it is possible to add an identity column to the table - then the solution is easier;
create table #SQLCumulativeCount
(
id int identity(1,1),
Dept varchar(100),
Count int
)
insert into #SQLCumulativeCount (Dept,Count) values ('PMO',106)
insert into #SQLCumulativeCount (Dept,Count) values ('Finance',64)
insert into #SQLCumulativeCount (Dept,Count) values ('Operations',41)
insert into #SQLCumulativeCount (Dept,Count) values ('Infrastructure',22)
insert into #SQLCumulativeCount (Dept,Count) values ('HR',21)
select *,
sum(Count) over(order by id rows unbounded preceding) as Cumulative
from #SQLCumulativeCount

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

aggregation according to different conditions on same column - sql

Related

Count number of cases where visitor rank is higher on one page then on another

How to get minimum 3 records per a group?

2 rows differences

Conditional Selection of Rows Using TSQL SQL Server (2008 R2)

SQL Cumulative Count

Categories

Resources