Generate group of data based on sum of values - sql

How to calculate the sum and total files.
here is my table
**FileId** **FileSize(MB)**
1 5
2 4
3 1
4 6
5 8
6 1
7 7
8 2
Expected result
BatchNo StartId EndId BatchSize
1 1 3 10
2 4 4 6
3 5 6 9
4 7 8 9
If File Size >= 10 then start new batch
also file count per batch is >= 10 then start new batch
StartId and EndId based on FileId
and BatchNo Is AutoIncrement

You can use recursive query like this
with rdata as
(
select row_number() over (order by fileId) rn, * from data
), rcte as
(
select 1 no, 1 gr, fileSize fileSizeSum , *
from rdata where fileid = 1
union all
select case when fileSizeSum + d.fileSize > 10 or r.no = 10 then 1 else r.no + 1 end gr,
case when fileSizeSum + d.fileSize > 10 or r.no = 10 then r.gr + 1 else r.gr end gr,
case when fileSizeSum + d.fileSize > 10 or r.no = 10 then d.fileSize else d.fileSize + fileSizeSum end fileSizeSum,
d.*
from rcte r
join rdata d on r.rn + 1 = d.rn
)
select r.gr,
min(fileId),
max(fileId),
max(fileSizeSum)
from rcte r
group by r.gr
dbfiddle

Here is another solution, with different approach:
INSERT INTO #batchdetails (FileID, FileSizeTotal, GroupID)
SELECT FileID, (
SELECT SUM(filesize) FROM #filedetails f2
WHERE f1.fileid >= f2.fileid ) AS FileSizeTotal,
1+CONVERT(INT,(
SELECT SUM(filesize) FROM #filedetails f2
WHERE f1.fileid >= f2.fileid
)/(#filesizepergroup+0.1)) AS GroupID
FROM #filedetails f1
SELECT DISTINCT
BatchDetails.GroupID AS BatchNo,
MIN(BatchDetails.FileID) OVER (PARTITION BY BatchDetails.GroupID ORDER BY BatchDetails.GroupID) AS StartID,
MAX(BatchDetails.FileID) OVER (PARTITION BY BatchDetails.GroupID ORDER BY BatchDetails.GroupID) AS EndID,
BatchSizeGroup.BatchSize
FROM #batchdetails BatchDetails
INNER JOIN (
SELECT GroupID, (GroupFileSizeTotal - LAG(GroupFileSizeTotal,1,0) OVER (ORDER BY GroupID)) AS BatchSize
FROM (
SELECT DISTINCT
GroupID,
MAX(FileSizeTotal) OVER (PARTITION BY GroupID ORDER BY GroupID) AS GroupFileSizeTotal
FROM #batchdetails
GROUP BY GroupID, FileSizeTotal
)A
)BatchSizeGroup ON BatchDetails.GroupID = BatchSizeGroup.GroupID
GROUP BY BatchDetails.GroupID, BatchDetails.FileID, BatchSizeGroup.BatchSize
Demo is Here : dbfiddle

Related

Selecting top most row in Bigquery based on conditions

I have a huge table, where sometimes 1 product ID has multiple specifications. I want to select the newest but unfortunately, I don't have the date information. please consider this example dataset
Row ID Type Sn Sn_Ind
1 3 SLN SL20 20
2 1 SL SL 0
3 2 SL SL 0
4 1 M SL21 10
5 3 M SL21 10
6 1 SLN SL20 20
I used the below query to somehow group the products in give them row numbers like
with cleanedMasterData as(
SELECT *
FROM (
SELECT *,ROW_NUMBER() OVER(PARTITION BY ID ORDER BY Sn DESC, Sn_Ind DESC) AS rn
FROM `project.dataset.table`
)
-- where rn = 1
)
select * from cleanedMasterData
Please find below the example table after cleaning
Row ID Type Sn Sn_Ind rn
1 1 SL SL 0 1
2 1 M SL21 10 2
3 1 SLN SL20 20 3
4 2 SL SL 0 1
5 3 M SL21 10 1
6 3 SLN SL20 20 2
but if you see for ID 2 and 3, I can easily select the top row with where rn = 1
but for ID 1, my preferred row would be 2 because that is the newest.
My question here is how do I prioritise a value in column so that I can get the desired solution like :
Row ID Type Sn Sn_Ind rn
1 1 M SL21 10 1
2 2 SL SL 0 1
3 3 M SL21 10 1
As the values are fixed in Sn column - for ex SL, SL20, SL19, SL21 etc - If somehow I can give weightage to these values and create a new temp column with weightage and sort based on it, then?
Thank you for your support in advance!!
Consider below
SELECT *
FROM `project.dataset.table`
WHERE TRUE
QUALIFY ROW_NUMBER() OVER(PARTITION BY ID ORDER BY IF(Sn = 'SL', 0, 1) DESC, Sn DESC) = 1
If applied to sample data in your question - output is
It wasn't difficult, I tried a few things and it worked out. If anyone can optimize the below solution even more that would be awesome.
first the dataset
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1 ID, 'SLN' Type, 'SL20' Sn, 20 Sn_Ind UNION ALL
SELECT 1 , 'SL' , 'SL' , 0 UNION ALL
SELECT 2 , 'SL' , 'SL' , 0 UNION ALL
SELECT 1 , 'M' , 'SL21' , 10 UNION ALL
SELECT 3 , 'M' , 'SL21' , 10 UNION ALL
SELECT 1 , 'SLN' , 'SL20' , 20
)
with weightage as(
SELECT
*,
MAX(CASE Sn WHEN 'SL' THEN 0 ELSE 1 END) OVER (PARTITION BY Sn) AS weightt,
FROM
`project.dataset.table`
ORDER BY
weightt DESC, Sn DESC
), main as (
select * EXCEPT(rn, weightt)
from (
select * ,ROW_NUMBER() OVER(PARTITION BY ID ORDER BY weightt DESC, Sn DESC) AS rn
from weightage )
where rn = 1
)
select * from main
after this, I can get the desired result
Row ID Type Sn Sn_Ind
1 1 M SL21 10
2 2 SL SL 0
3 3 M SL21 10

sql numbering the partition of Numbers

I have a set of numbers like this
ID
===
1
2
3
1
2
1
1
2
3
4
5
...
I want to select a new row that increase when fetch next 1 like this
ID number
=== ========
1 1
2 1
3 1
1 2
2 2
1 3
1 4
2 4
3 4
4 4
5 4
Any suggestion ?
Assuming that you have a column o which specify the ordering then you can use a self-join like this:
select d1.o, d1.id, count(*)
from data d1
join data d2 on d1.o >= d2.o and d2.id = 1
group by d1.o, d1.id
DBFiddle DEMO
You can solve this with use of cte and window functions, as follows:
DECLARE #t TABLE (ID INT);
INSERT INTO #t VALUES (1),(2),(3),(1),(2),(1),(1),(2),(3),(4),(5);
WITH cte AS(
SELECT ID, ROW_NUMBER() OVER (ORDER BY (SELECT 1)) rn
FROM #t
),
cte1 AS(
SELECT ID, rn, ROW_NUMBER() OVER (ORDER BY rn) rn2
FROM cte
WHERE ID = 1
)
SELECT c.ID, MAX(rn2) OVER (ORDER BY c.rn) rn
FROM cte c
LEFT JOIN cte1 c1 ON c1.rn = c.rn
ORDER BY c.rn

Aggregate within a group of unchanged values

I have sample data:
RowId TypeId Value
1 1 34
2 1 53
3 1 34
4 2 43
5 2 65
6 16 54
7 16 34
8 1 45
9 6 43
10 6 34
11 16 64
12 16 63
I want to count row for each type (The Value does not matter to me), but only for... neighbor TypeId
TypeId Count
1 3
2 2
16 2
1 1
6 2
16 2
How to achieve this result?
This should give you COUNT of rows within a group of unchanged values:
SELECT TypeId, grp, COUNT(*) FROM (
SELECT RowId, TypeId , Value, gap, SUM(gap) over (ORDER BY RowId ) grp
FROM (SELECT RowId, TypeId , Value,
CASE WHEN TypeId = lag(TypeId) over (ORDER BY RowId )
THEN 0
ELSE 1
END gap
FROM dummy
) t
) tt
GROUP BY TypeId, grp;
If you prefer WITH over endless sub-query inclusions:
WITH dummy_with_groups AS (
SELECT RowId, TypeId , Value, SUM(gap) OVER (ORDER BY RowId) grp
FROM (SELECT RowId, TypeId , Value,
CASE WHEN TypeId = lag(TypeId) OVER (ORDER BY RowId)
THEN 0 ELSE 1 END gap
FROM dummy) t
)
SELECT TypeId, COUNT(*) as Result
FROM dummy_with_groups
GROUP BY TypeId, grp;
http://www.sqlfiddle.com/#!6/f16e9/34
Check this fiddle demo. I have renamed your columns a little.
WITH myCTE AS
(SELECT row_id,
type_id,
ROW_NUMBER () OVER (PARTITION BY type_id ORDER BY row_id)
AS cnt,
CASE LEAD (type_id) OVER (ORDER BY row_id)
WHEN type_id THEN 0
ELSE 1
END
AS show
FROM dummy),
innerQuery AS
(SELECT row_id, type_id, cnt
FROM myCTE
WHERE show = 1)
SELECT iq1.type_id, iq1.cnt - ISNULL (iq2.cnt, 0) CNT
FROM innerQuery iq1
LEFT OUTER JOIN innerQuery iq2
ON iq1.type_id = iq2.type_id
AND EXISTS
(SELECT 1
FROM innerQuery iq3
WHERE iq3.type_id = iq1.type_id
AND iq3.row_id < iq1.row_id
HAVING MAX (iq3.row_id) = iq2.row_id)
The output is exactly as expected.

How to get average of the 'middle' values in a group?

I have a table that has values and group ids (simplified example). I need to get the average for each group of the middle 3 values. So, if there are 1, 2, or 3 values it's just the average. But if there are 4 values, it would exclude the highest, 5 values the highest and lowest, etc. I was thinking some sort of window function, but I'm not sure if it's possible.
http://www.sqlfiddle.com/#!11/af5e0/1
For this data:
TEST_ID TEST_VALUE GROUP_ID
1 5 1
2 10 1
3 15 1
4 25 2
5 35 2
6 5 2
7 15 2
8 25 3
9 45 3
10 55 3
11 15 3
12 5 3
13 25 3
14 45 4
I'd like
GROUP_ID AVG
1 10
2 15
3 21.6
4 45
Another option using analytic functions;
SELECT group_id,
avg( test_value )
FROM (
select t.*,
row_number() over (partition by group_id order by test_value ) rn,
count(*) over (partition by group_id ) cnt
from test t
) alias
where
cnt <= 3
or
rn between floor( cnt / 2 )-1 and ceil( cnt/ 2 ) +1
group by group_id
;
Demo --> http://www.sqlfiddle.com/#!11/af5e0/59
I'm not familiar with the Postgres syntax on windowed functions, but I was able to solve your problem in SQL Server with this SQL Fiddle. Maybe you'll be able to easily migrate this into Postgres-compatible code. Hope it helps!
A quick primer on how I worked it.
Order the test scores for each group
Get a count of items in each group
Use that as a subquery and select only the middle 3 items (that's the where clause in the outer query)
Get the average for each group
--
select
group_id,
avg(test_value)
from (
select
t.group_id,
convert(decimal,t.test_value) as test_value,
row_number() over (
partition by t.group_id
order by t.test_value
) as ord,
g.gc
from
test t
inner join (
select group_id, count(*) as gc
from test
group by group_id
) g
on t.group_id = g.group_id
) a
where
ord >= case when gc <= 3 then 1 when gc % 2 = 1 then gc / 2 else (gc - 1) / 2 end
and ord <= case when gc <= 3 then 3 when gc % 2 = 1 then (gc / 2) + 2 else ((gc - 1) / 2) + 2 end
group by
group_id
with cte as (
select
*,
row_number() over(partition by group_id order by test_value) as rn,
count(*) over(partition by group_id) as cnt
from test
)
select
group_id, avg(test_value)
from cte
where
cnt <= 3 or
(rn >= cnt / 2 - 1 and rn <= cnt / 2 + 1)
group by group_id
order by group_id
sql fiddle demo
in the cte, we need to get count of elements over each group_id by window function + calculate row_number inside each group_id. Then, if this count > 3 then we need to get middle of the group by dividing count by 2 and then get +1 and -1 element. If count <= 3, then we should just take all elements.
This works:
SELECT A.group_id, avg(A.test_value) AS avg_mid3 FROM
(SELECT group_id,
test_value,
row_number() OVER (PARTITION BY group_id ORDER BY test_value) AS position
FROM test) A
JOIN
(SELECT group_id,
CASE
WHEN count(*) < 4 THEN 1
WHEN count(*) % 2 = 0 THEN (count(*)/2 - 1)
ELSE (count(*) / 2)
END AS position_start,
CASE
WHEN count(*) < 4 THEN count(*)
WHEN count(*) % 2 = 0 THEN (count(*)/2 + 1)
ELSE (count(*) / 2 + 2)
END AS position_end
FROM test GROUP BY group_id) B
ON A.group_id=B.group_id
AND A.position >= B.position_start
AND A.position <= B.position_end
GROUP BY A.group_id
Fiddle link: http://www.sqlfiddle.com/#!11/af5e0/56
If you need to calculate the average values ​​for groups then you can do this:
SELECT CASE WHEN NUMBER_FIRST_GROUP <> 0
THEN SUM_FIRST_GROUP / NUMBER_FIRST_GROUP
ELSE NULL
END AS AVG_FIRST_GROUP,
CASE WHEN NUMBER_SECOND_GROUP <> 0
THEN SUM_SECOND_GROUP / NUMBER_SECOND_GROUP
ELSE NULL
END AS AVG_SECOND_GROUP,
CASE WHEN NUMBER_THIRD_GROUP <> 0
THEN SUM_THIRD_GROUP / NUMBER_THIRD_GROUP
ELSE NULL
END AS AVG_THIRD_GROUP,
CASE WHEN NUMBER_FOURTH_GROUP <> 0
THEN SUM_FOURTH_GROUP / NUMBER_FOURTH_GROUP
ELSE NULL
END AS AVG_FOURTH_GROUP
FROM (
SELECT
SUM(CASE WHEN GROUP_ID = 1 THEN 1 ELSE 0 END) AS NUMBER_FIRST_GROUP,
SUM(CASE WHEN GROUP_ID = 1 THEN TEST_VALUE ELSE 0 END) AS SUM_FIRST_GROUP,
SUM(CASE WHEN GROUP_ID = 2 THEN 1 ELSE 0 END) AS NUMBER_SECOND_GROUP,
SUM(CASE WHEN GROUP_ID = 2 THEN TEST_VALUE ELSE 0 END) AS SUM_SECOND_GROUP,
SUM(CASE WHEN GROUP_ID = 3 THEN 1 ELSE 0 END) AS NUMBER_THIRD_GROUP,
SUM(CASE WHEN GROUP_ID = 3 THEN TEST_VALUE ELSE 0 END) AS SUM_THIRD_GROUP,
SUM(CASE WHEN GROUP_ID = 4 THEN 1 ELSE 0 END) AS NUMBER_FOURTH_GROUP,
SUM(CASE WHEN GROUP_ID = 4 THEN TEST_VALUE ELSE 0 END) AS SUM_FOURTH_GROUP
FROM TEST
) AS FOO

How to write a cross tab query with median

Here's my table
ST NUM
1 1
1 2
1 2
2 1
2 2
2 2
3 2
3 8
I want to return a query where it returns the median of NUM for each ST
ST NUM
1 2
2 2
3 5
I already have a median function
SELECT
CONVERT(DECIMAL(10,2), (
(CONVERT (DECIMAL(10,2),
(SELECT MAX(num) FROM
(SELECT TOP 50 PERCENT num FROM dbo.t ORDER BY num ASC) AS H1)
+
(SELECT MIN(sortTime) FROM
(SELECT TOP 50 PERCENT num FROM dbo.t ORDER BY num DESC) AS H2)
))) / 2) AS Median
Any tips for how to do this?
try this
With
MedianResult
as
(
Select
ST,NUM ,
Row_Number() OVER(Partition by ST Order by NUM) as A,
Row_Number() OVER(Partition by ST Order by NUM desc) as B
from **YourTableName**
)
Select ST, Avg(NUM) as Median
From MedianResult
Where Abs(A-B)<=1
Group by ST