Distribute sequential SQL results evenly based on count - sql

I have SQL results that I need to break into item ranges and the count distributed evenly across a number of tasks. What is a good way to do this?
My data looks like this.
+------+-------+----------+
| Item | Count | ItmGroup |
+------+-------+----------+
| 1A | 100 | 1 |
| 1B | 25 | 1 |
| 1C | 2 | 1 |
| 1D | 6 | 1 |
| 2A | 88 | 2 |
| 2B | 10 | 2 |
| 2C | 122 | 2 |
| 2D | 12 | 2 |
| 3A | 4 | 3 |
| 3B | 103 | 3 |
| 3C | 1 | 3 |
| 3D | 22 | 3 |
| 4A | 55 | 4 |
| 4B | 42 | 4 |
| 4C | 100 | 4 |
| 4D | 1 | 4 |
+------+-------+----------+
Item = the item code.
Count = this context it is determining the popularity of the item. This can be used to RANK items if need be.
ItmGroup - this is a parent value for the Itm column. Item is contained in a Group.
What differentiates this from other similar questions I'veviewed is that the ranges I need to determine cannot be taken out of the order they show in this table. We can do Item Range from A1 to B3, in other words, they can cross over ItmGroups, but they must remain in alphanumeric order by Item.
The expected result would be item ranges that evenly distribute the total count.
+------+-------+----------+
| FrItem | ToItem | TotCount|
+------+-------+----------+
| 1A | 2D | 134 |
| 3A | 3D | 130 |
(etc)

Provided you've happy with a rough estimate, this will split the data in to two groups.
The first group will always have as many records as possible, but no more than half of the total count (and group 2 will have the rest).
WITH
cumulative AS
(
SELECT
*,
SUM([Count]) OVER (ORDER BY Item) AS cumulativeCount,
SUM([Count]) OVER () AS totalCount
FROM
yourData
)
SELECT
MIN(item) AS frItem,
MAX(item) AS toItem,
SUM([Count]) AS TotCount
FROM
cumulative
GROUP BY
CASE WHEN cumulativeCount <= totalCount / 2 THEN 0 ELSE 1 END
ORDER BY
CASE WHEN cumulativeCount <= totalCount / 2 THEN 0 ELSE 1 END
To split the data in to 5 portions, it's similar...
GROUP BY
CASE WHEN cumulativeCount <= totalCount * 1/5 THEN 0
WHEN cumulativeCount <= totalCount * 2/5 THEN 1
WHEN cumulativeCount <= totalCount * 3/5 THEN 2
WHEN cumulativeCount <= totalCount * 4/5 THEN 3
ELSE 4 END
Depending on your data this isn't necessarily ideal
Item | Count GroupAsDefinedAbove IdealGroup
------+-------
1A | 4 1 1
2A | 5 2 1
3A | 8 2 2
If you want something that can get the two groups as close in size as possible, that's a lot more complex.

Same as the accepted answer, except declaring a batch number and an addition to the select statement in the WITH cumulativeCte to prevent a remainder.
DECLARE #BatchCount NUMERIC(4,2) = 5.00;
WITH
cumulativeCte AS
(
SELECT
*,
SUM(r.[Count]) OVER (ORDER BY Item) AS cumulativeCount,
SUM(r.[Count]) OVER () AS totalCount
,CEILING(SUM(r.[Count]) OVER (ORDER BY IM.MMITNO ASC) / (SUM(r.[Count]) OVER () / #BatchCount)) AS BatchNo
FROM
records r
)
SELECT
MIN(c.Item) AS frItem,
MAX(c.Item) AS toItem,
SUM(c.[Count]) AS TotCount,
c.BatchNo
FROM
cumulativeCte c
GROUP BY
c.BatchNo
ORDER BY
c.BatchNo

Related

How to order id's using subtotal from another column in PostgreSQL

I have a table returned by a select query. Example :
id | day | count |
-- | ------ | ----- |
1 | 71 | 3 |
1 | 70 | 2 |
1 |Subtotal| 5 |
2 | 70 | 5 |
2 | 71 | 2 |
2 | 69 | 2 |
2 |Subtotal| 9 |
3 | 69 | 1 |
3 | 70 | 1 |
3 |Subtotal| 2 |
the day column contains text values (so varchar)
subtotal is the sum of the counts for an id (e.g. id 2 has subtotal of 5 + 2 + 2 = 9)
I now want to order this table so the id’s with the lowest subtotal count come first, and then ordered by day with subtotal at the end (like before)
Expected output:
id | day | count |
-- | ------ | ----- |
3 | 69 | 1 |
3 | 70 | 1 |
3 |Subtotal| 2 |
1 | 70 | 2 |
1 | 71 | 3 |
1 |Subtotal| 5 |
2 | 69 | 2 |
2 | 70 | 5 |
2 | 71 | 2 |
2 |Subtotal| 9 |
I can't figure out how to order based on subtotal only ?
i've tried multiple order by (eg: ORDER BY day = 'Subtotal' & a mix of others) and using window functions but none are helping. Cheers !
Not sure if it's directly applicable to your source query (since you haven't included it) however the ordering you require on the sample data can be done with:
order by Max(count) over(partition by id), day
Note - ordering by day works with your sample data but as it's a string it will not honour numeric ordering, this should really be ordered by the source of the numerical value - again since we don't have your actual query I can't suggest anything more applicable but I'm sure you can substitute the correct column/expression.
I just crated table with 3 columns and tried to reproduce your expected result. I assume that there might be a problem ordering by day, subtotal would be always on top, but it seems as working solution.
create table test
(
id int,
day varchar(15),
count int
)
insert into test
values
(1,'71',3),
(1,'70',2),
(2,'70',5),
(2,'71',2),
(2,'69',2),
(3,'69',1),
(3,'70',1)
select id, day, count
from
(
select id, day, sum(count) as count
from test
group by id, rollup(day)
) as t
order by Max(count) over(partition by id), day

Partition By - Sum all values Excluding Maximum Value

I have data as follows
+----+------+--------+
| ID | Code | Weight |
+----+------+--------+
| 1 | M | 200 |
| 1 | 2A | 50 |
| 1 | 2B | 50 |
| 2 | | 350 |
| 2 | M | 350 |
| 2 | 3A | 120 |
| 2 | 3B | 120 |
| 3 | 5A | 100 |
| 4 | | 200 |
| 4 | | 100 |
+----+------+--------+
For ID 1 the max weight is 200, I want to subtract sum of all weights from ID 1 except the max value that is 200.
There might be a case when there are 2 rows containing max values for same id. Example for ID 2 we have 2 rows containing max value i.e. 350 . In such scenario I want to sum all values except the max value. But I would mark weight 0 for 1 of the 2 rows containing max value. That row would be the one where Code is NULL/Blank.
Case where there is only 1 row for an ID the row would be kept as is.
Another scenario could be one where there is only row containing max weight but Code is NULL/Blank in such case we would simply do what we did for ID 1. Sum all values except max value and subtract from row containing max value.
Desired Output
+----+------+--------+---------------+
| ID | Code | Weight | Actual Weight |
+----+------+--------+---------------+
| 1 | M | 200 | 100 |
| 1 | 2A | 50 | 50 |
| 1 | 2B | 50 | 50 |
| 2 | | 350 | 0 |
| 2 | M | 350 | 110 |
| 2 | 3A | 120 | 120 |
| 2 | 3B | 120 | 120 |
| 3 | 5A | 100 | 100 |
| 4 | | 200 | 100 |
| 4 | | 100 | 100 |
+----+------+--------+---------------+
I want to create column Actual Weight as shown above. I can't find a way to apply partition by excluding max value and create column Actual Weight.
dense_rank() to identify the row with max weight, dr = 1 is rows with max weight
row_number() to differentiate the max weight row for Code = blank from others
with cte as
(
select *,
dr = dense_rank() over (partition by ID order by [Weight] desc),
rn = row_number() over (partition by ID order by [Weight] desc, Code desc)
from tbl
)
select *,
ActWeight = case when dr = 1 and rn <> 1
then 0
when dr = 1 and rn = 1
then [Weight]
- sum(case when dr <> 1 then [Weight] else 0 end) over (partition by ID)
else [Weight]
end
from cte
dbfiddle demo
Hmmm . . . I think you just want window functions and conditional logic:
select t.*,
(case when 1 = row_number() over (partition by id order by weight desc, (case when code <> '' then 2 else 1 end))
then weight - sum(case when weight <> max_weight then weight else 0 end) over (partition by id)
else weight
end) as actual_weight
from (select t.*,
max(weight) over (partition by id, code) as max_weight
from t
) t

How to count how many times a specific value appeared on each columns and group by range

I'm new on postgres and I have a question:
I have a table with 100 columns. I need to count the values from each columns and count how many times they appeared, so I can group then based on the range that they fit
I have a table like this(100 columns)
+------+------+------+------+------+---------+--------+
| Name | PRB0 | PRB1 | PRB2 | PRB3 | ....... | PRB100 |
+------+------+------+------+------+---------+--------+
| A | 15 | 6 | 47 | 54 | ..... | 8 |
| B | 25 | 22 | 84 | 86 | ..... | 76 |
| C | 57 | 57 | 96 | 38 | ..... | 28 |
+------+------+------+------+------+---------+--------+
And need the output to be something like this
+------+---------------+----------------+----------------+----------------+-----+-----------------+--+
| Name | Count 0 to 20 | Count 21 to 40 | Count 41 to 60 | Count 61 to 70 | ... | Count 81 to 100 | |
+------+---------------+----------------+----------------+----------------+-----+-----------------+--+
| A | 5 | 46 | 87 | 34 | ... | 98 | |
| B | 5 | 2 | 34 | 56 | ... | 36 | |
| C | 7 | 17 | 56 | 78 | ... | 88 | |
+------+---------------+----------------+----------------+----------------+-----+-----------------+--+
For Name A we have:
5 times the number between 0 and 20 apeared
46 times the number between 21 and 40 appeared
86 times the number between 41 and 60 appeared
Basicaly I need something like the function COUNTIFS that we have on Excel. On excel we just need to especify the range of columns and the condition.
You could unpivot with a lateral join, then aggregate:
select
name,
count(*) filter(where prb between 0 and 20) cnt_00_20,
count(*) filter(where prb between 21 and 50) cnt_21_20,
...,
count(*) filter(where prb between 81 and 100) cnt_81_100
from mytable t
cross join lateral (values(t.prb0), (t.prb1), ..., (t.prb100)) p(prb)
group by name
Note, however, that this still requires you to enumerate all the columns in the values() table constructor. If you want something fully dynamic, you can use json instead. The idea is to turn each record to a json object using to_jsonb(), then to rows with jsonb_each(); you can then do conditional aggregation.
select
name,
count(*) filter(where prb::int between 0 and 20) cnt_00_20,
count(*) filter(where prb::int between 21 and 50) cnt_21_20,
...,
count(*) filter(where prb::int between 81 and 100) cnt_81_100
from mytable t
cross join lateral to_jsonb(t) j(js)
cross join lateral jsonb_each( j.js - 'name') r(col, prb)
group by name

Pair entry of every nth row with entry of every (n+1)th row

I have a result table
id | name | wins
----+-------------------
57 | Paul | 10
64 | Sven | 9
62 | Peter | 9
59 | Marina | 8
58 | Carlos | 4
60 | Pamela | 3
61 | Marcus | 2
63 | Hank | 1
Where I want to pair every nth entry with every (n+1)th entry, such that the resulting table looks like that:
id | name | id | name
----+-------------------
57 | Paul | 64 | Sven
62 | Peter | 59 | Marina
58 | Carlos | 60 | Pamela
61 | Marcus | 63 | Hank
Which SQL statement would achieve that?
;WITH cte AS (
SELECT *,ROW_NUMBER() OVER (ORDER BY Wins DESC) as RowNum
FROM
#Table
)
SELECT *
FROM
cte c1
LEFT JOIN cte c2
ON c1.RowNum + 1 = c2.RowNum
WHERE
c1.RowNum % 2 <> 0
Generate a ROW_NUMBER to use, seeing you have a third Column replace (SELECT NULL) in the Order by statement with that third column.
Then select all rows that are Odd Row numbers (remainder of RowNum divided by 2 <> 0 ) and self join back to itself with RowNum + 1. If you have an odd number of Rows you might consider using LEFT JOIN so you don't drop off the 1 row that won't have a match.

How to use previous row's column's value for calculating the next row's column's value

I have a table
Id | Aisle | OddEven | Bay | Size | Y-Axis
3 | A1 | Even | 14 | 10 | 100
1 | A1 | Even | 16 | 10 |
6 | A1 | Even | 20 | 10 |
12 | A1 | Even | 26 | 5 | 150
10 | A1 | Even | 28 | 5 |
11 | A1 | Even | 32 | 5 |
2 | A1 | Odd | 13 | 10 | 100
5 | A1 | Odd | 17 | 10 |
4 | A1 | Odd | 19 | 10 |
9 | A1 | Odd | 23 | 5 | 150
7 | A1 | Odd | 25 | 5 |
8 | A1 | Odd | 29 | 5 |
want to look like this
Id | Aisle | OddEven | Bay | Size | Y-Axis
1 | A1 | Even | 14 | 10 | 100
2 | A1 | Even | 16 | 10 | 110
3 | A1 | Even | 20 | 10 | 120
4 | A1 | Even | 26 | 5 | 150
5 | A1 | Even | 28 | 5 | 155
6 | A1 | Even | 32 | 5 | 160
7 | A1 | Odd | 13 | 10 | 100
8 | A1 | Odd | 17 | 10 | 110
9 | A1 | Odd | 19 | 10 | 120
10 | A1 | Odd | 23 | 5 | 150
11 | A1 | Odd | 25 | 5 | 155
12 | A1 | Odd | 29 | 5 | 160
I need a select query and update query. What its doing is there are already some Y-Axis Number been filled (at the start of the Odd/Even) then I need to take the previous row's Y-Axis column's value and adds to the current rows's size which = to current Y-Axis. Needs to keep doing it until it finds another Y-Axis has the value it skips the calculation and next row is using that number.
My thinking process is this:
Id will definitely be used, however, the Id is not sequence as shown my example
so I need to have
ROW_Number OVER (PARTITION BY Aisle,OddEven,Bay Order BY Aisle,OddEven,Bay)
Then some kind of JOIN the same table but the ON is T1.RN = T2.RN - 1
Where I am stuck is but the first row has not previous value it will try to update that value.
Anyone have an idea for SQL Query 2008 for Select and Update will be greatly appreciated! Thanks.
You seem to want a cumulative sum. This would be easier in SQL Server 2012+. You can do this in SQL Server 2008 using outer apply:
select t.*, cume_value
from t outer apply
(select sum(size) + sum(yaxis) as cume_value
from t t2
where t2.aisle = t.aisle and t2.oddeven = t.oddeven and
t2.bay < t.bay
) t2;
A little more difficult on 2008, but I think this is what you are looking for
Declare #Table table (Id int,Aisle varchar(25),OddEven varchar(25),Bay int,Size int,[Y-Axis] int)
Insert Into #Table values
(3,'A1','Even',14,10 ,100),
(1,'A1','Even',16,10 ,0),
(6,'A1','Even',20,10 ,0),
(12,'A1','Even',26,5,150),
(10,'A1','Even',28,5,0),
(11,'A1','Even',32,5,0),
(2,'A1','Odd',13,10 ,100),
(5,'A1','Odd',17,10 ,0),
(4,'A1','Odd',19,10 ,0),
(9,'A1','Odd',23,5,150),
(7,'A1','Odd',25,5,0),
(8,'A1','Odd',29,5,0)
;with cteBase as (
Select *
,IDNew=Row_Number() over (Order By Aisle,Bay)
,RowNr=Row_Number() over (Order By Aisle,OddEven,Bay)
From #Table
)
, cteGroup as (Select TmpRowNr=RowNr,GrpNr=Row_Number() over (Order By RowNr) from cteBase where [Y-Axis]>0)
, cteFinal as (
Select A.*
,GrpNr = (Select max(GrpNr) from cteGroup Where TmpRowNr<=RowNr)
From cteBase A
)
Select ID=Row_Number() over (Order By A.OddEven,A.Bay)
,A.Aisle
,A.OddEven
,A.Bay
,A.Size
,[Y-Axis] = Sum(case when B.[Y-Axis]>0 then B.[Y-Axis] else B.Size end)
From cteFinal A
Join cteFinal B on (B.RowNr<=A.RowNr and A.GrpNr=B.GrpNr)
Group By
A.IDNew
,A.Aisle
,A.OddEven
,A.Bay
,A.Size
Order By A.OddEven,A.Bay
Returns
ID Aisle OddEven Bay Size Y-Axis
1 A1 Even 14 10 100
2 A1 Even 16 10 110
3 A1 Even 20 10 120
4 A1 Even 26 5 150
5 A1 Even 28 5 155
6 A1 Even 32 5 160
7 A1 Odd 13 10 100
8 A1 Odd 17 10 110
9 A1 Odd 19 10 120
10 A1 Odd 23 5 150
11 A1 Odd 25 5 155
12 A1 Odd 29 5 160
I gotta leave my computer so update query should be easy to move on from here.
Below is the select query;
select row_number() over (order by oddeven,bay) id,
Aisle,
OddEven,
Bay,
Size,
max(ISNULL([Y-Axis],0)) over (partition by Aisle, OddEven,Size order by bay)
+ sum(CASE WHEN [Y-Axis] is null THEN Size ELSE 0 END) over (partition by Aisle,OddEven,size order by Bay) as [Y-Axis]
from oddseven
order by id