Select to build groups by (analytic) sum - sql

Please help me to build a sql select to assign (software development) tasks to a software release. Actually this is a fictive example to solve my real business specific problem.
I have a relation Tasks:
ID Effort_In_Days
3 3
1 2
6 2
2 1
4 1
5 1
I want to distribute the Tasks to releases which are at most 2 days long (tasks longer than 2 shall still be put into one release). In my real problem I have much more "days" available to distribute "tasks" to. Expected output:
Release Task_ID
1 3
2 1
3 6
4 2
4 4
5 5
I think I need to use analytic functions, something with sum(effort_in_days) over and so on, to get the result. But I'm I haven't used analytic functions much and didn't find an example that's close enough to my specific problem. I need to build groups (releases) if a sum (>= 2) is reached.

I would do something like:
with data as (
select 3 ID, 3 Effort_In_Days from dual union all
select 1 ID, 2 Effort_In_Days from dual union all
select 6 ID, 2 Effort_In_Days from dual union all
select 2 ID, 1 Effort_In_Days from dual union all
select 4 ID, 1 Effort_In_Days from dual union all
select 5 ID, 1 Effort_In_Days from dual
)
select id, effort_in_days, tmp, ceil(tmp/2) release
from (
select id, effort_in_days, sum(least(effort_in_days, 2)) over (order by effort_in_days desc rows unbounded preceding) tmp
from data
);
Which results in:
ID EFFORT_IN_DAYS TMP RELEASE
---------- -------------- ---------- ----------
3 3 2 1
1 2 4 2
6 2 6 3
2 1 7 4
4 1 8 4
5 1 9 5
Basically, I am using least() to convert everything over 2 down to 2. Then I am putting all rows in descending order by that value and starting to assign releases. Since they are in descending order with a max value of 2, I know I need to assign a new release every time when I get to a multiple of 2.
Note that if you had fractional values, you could end up with releases that do not have a full 2 days assigned (as opposed to having over 2 days assigned), which may or may not meet your needs.
Also note that I am only showing all columns in my output to make it easier to see what the code is actually doing.

This is an example of a bin-packing problem (see here). There is not an optimal solution in SQL, that I am aware of, except in some boundary cases. For instance, if all the tasks have the same length or if all the tasks are >= 2, then there is an easy-to-find optimal solution.
A greedy algorithm works pretty well. This is to put a given record in the first bin where it fits, probably going through the list in descending size order.
If your problem is really as you state it, then the greedy algorithm will work to produce an optimal solution. That is, if the maximum value is 2 and the efforts are integers. There might even be a way to calculate the solution in SQL in this case.
Otherwise, you will need pl/sql code to achieve an approximate solution.

SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE data AS
select 3 ID, 3 Effort_In_Days from dual union all
select 1 ID, 2 Effort_In_Days from dual union all
select 6 ID, 2 Effort_In_Days from dual union all
select 2 ID, 1 Effort_In_Days from dual union all
select 4 ID, 1 Effort_In_Days from dual union all
select 5 ID, 1 Effort_In_Days from dual union all
select 9 ID, 2 Effort_In_Days from dual union all
select 7 ID, 1 Effort_In_Days from dual union all
select 8 ID, 1 Effort_In_Days from dual;
Query 1:
Give the rows an index so that they can be kept in order easily;
Assign groups to the rows where the Effort_In_Days is 1 so that all adjacent rows with Effort_In_Days of 1 are in the same group and rows separated by higher values for Effort_In_Days are in different groups;
Assign a cost of 1 to each row where the Effort_In_Days is higher than 1 or where Effort_In_Days is 1 and the row has an odd row number within the group; then
Finally, the release is the sum of all the costs for the row and all preceding rows.
Like this:
WITH indexes AS (
SELECT ID,
Effort_In_Days,
ROWNUM AS idx
FROM Data
),
groups AS (
SELECT ID,
Effort_In_Days,
idx,
CASE Effort_In_Days
WHEN 1
THEN idx - ROW_NUMBER() OVER ( PARTITION BY Effort_In_Days ORDER BY idx )
END AS grp
FROM indexes
ORDER BY idx
),
costs AS (
SELECT ID,
Effort_In_Days,
idx,
CASE Effort_In_Days
WHEN 1
THEN MOD( ROW_NUMBER() OVER ( PARTITION BY grp ORDER BY idx ), 2 )
ELSE 1
END AS cost
FROM groups
ORDER BY idx
)
SELECT ID,
Effort_In_Days,
SUM( cost ) OVER ( ORDER BY idx ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS Release
FROM costs
ORDER BY idx
Results:
| ID | EFFORT_IN_DAYS | RELEASE |
|----|----------------|---------|
| 3 | 3 | 1 |
| 1 | 2 | 2 |
| 6 | 2 | 3 |
| 2 | 1 | 4 |
| 4 | 1 | 4 |
| 5 | 1 | 5 |
| 9 | 2 | 6 |
| 7 | 1 | 7 |
| 8 | 1 | 7 |

Related

SUM a column in SQL, based on DISTINCT values in another column, GROUP BY a third column

I'd appreciate some help on the following SQL problem:
I have a table of 3 columns:
ID Group Value
1 1 5
1 1 5
1 2 10
1 2 10
1 3 20
2 1 5
2 1 5
2 1 5
2 2 10
2 2 10
3 1 5
3 2 10
3 2 10
3 2 10
3 4 50
I need to group by ID, and I would like to SUM the values based on DISTINCT values in Group. So the value for a group is only accounted for once even though it may appear multiple for times for a particular ID.
So for IDs 1, 2 and 3, it should return 35, 15 and 65, respectively.
ID SUM
1 35
2 15
3 65
Note that each Group doesn't necessarily have a unique value
Thanks
the CTE will remove all duplicates, so if there a sdiffrenet values for ID and Group, it will be counted.
The next SELECT wil "GROUP By" ID
For Pstgres you would get
WITH CTE as
(SELECT DISTINCT "ID", "Group", "Value" FROM tablA
)
SELECT "ID", SUM("Value") FROM CTE GROUP BY "ID"
ORDER BY "ID"
ID | sum
-: | --:
1 | 35
2 | 15
3 | 65
db<>fiddle here
Given what we know at the moment this is what I'm thinking...
The CTE/Inline view eliminate duplicates before the sum occurs.
WITH CTE AS (SELECT DISTINCT ID, Group, Value FROM TableName)
SELECT ID, Sum(Value)
FROM CTE
GROUP BY ID
or
SELECT ID, Sum(Value)
FROM (SELECT DISTINCT * FROM TableName) CTE
GROUP BY ID

BigQuery: Flattening all repeated fields in nested schema

I am having so much trouble with querying from Big Query's nested schema.
I have the following fields.
I want to flatten the table and get something like this.
user | question_id | user_choices
123 | 1 | 1
123 | 1 | 2
123 | 1 | 3
123 | 1 | 4
From other resources, I got to a point where I can query from one of the records in the repeated columns. Such as the following:
SELECT user, dat.question_id FROM tablename, UNNEST(data) dat
It gives me this result.
But when I do this, I get another repeated columns again.
SELECT user, dat.question_id, dat.user_choices FROM tablename, UNNEST(data) dat
Can anyone help me how to UNNEST this table properly so I can have flattened schema for all data items?
Thanks!
Below is for BigQuery Standard SQL
#standardSQL
SELECT user, question_id, choice
FROM `project.dataset.table`,
UNNEST(data) question,
UNNEST(user_choices) choice
You can test, play with above using dummy data from your question like below
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1 user,
[STRUCT<question_id INT64, user_choices ARRAY<INT64>>
(1,[1,2,3]),
(2,[2,5]),
(3,[1,3])
] data UNION ALL
SELECT 2 user,
[STRUCT<question_id INT64, user_choices ARRAY<INT64>>
(1,[2,3]),
(2,[4,5]),
(3,[2,6])
] data
)
SELECT user, question_id, choice
FROM `project.dataset.table`,
UNNEST(data) question,
UNNEST(user_choices) choice
ORDER BY user, question_id, choice
with result
Row user question_id choice
1 1 1 1
2 1 1 2
3 1 1 3
4 1 2 2
5 1 2 5
6 1 3 1
7 1 3 3
8 2 1 2
9 2 1 3
10 2 2 4
11 2 2 5
12 2 3 2
13 2 3 6

SQL: How to add values according to index columns

I have an sql table which looks like the following:
|value| |position| |relates_to_position| |type|
100 | 2 | NULL | 1
50 | 6 | NULL | 2
20 | 7 | 6 | 3
From this I need to create the resulting table, which adds all the lines with a |relates_to_position| field to the line which has |position| = |relates_to_position|.
For the above table, this would be
|value| |position| |relates_to_position| |type|
100 2 NULL 1
70 6 NULL 2
I am quite a newbie in SQL, so I would be glad for help. The database I use is Oracle XE 11. There will only be a single level of relates_to_position, meaning, that if relates_to_position is set, no other line will reference to this line.
If we only assume 1 level of hierarchy. If multiple level's of hierarchy this gets more interesting.
SELECT A.Value+coalesce(B.Value,0) as Value
, A.Position
, A.Relates_to_Position
, A.Type
FROM Table A
LEFT JOIN Table B
on B.Relates_To_Position = A.Position
WHERE A. Relate_to_Position is null
What this does is a self join so it puts related records on the same row. it then eliminate all those records with a value in relate_to_position as they will be added to a parent row.
we use a LEFT join because not all records will have a related value and we use coalesce to ensure null's are not attempted to be added. (coalesce takes the first non-null value)
Not sure why you need relates_To_Position returned as it will ALWAYS be null..
If you can have more than one level of hierarchy and they all need to sum up to the root position, then the following ought to do the trick:
WITH sample_data AS (SELECT 100 VALUE, 2 position, NULL relates_to_position, 1 TYPE FROM dual UNION ALL
SELECT 50 VALUE, 6 position, NULL relates_to_position, 2 TYPE FROM dual UNION ALL
SELECT 20 VALUE, 7 position, 6 relates_to_position, 3 TYPE FROM dual UNION ALL
SELECT 10 VALUE, 8 position, 7 relates_to_position, 3 TYPE FROM dual)
SELECT SUM(VALUE) VALUE,
root_position position,
root_type TYPE
FROM (SELECT value,
position,
TYPE,
connect_by_root(position) root_position,
connect_by_root(TYPE) root_type
FROM sample_data
CONNECT BY PRIOR position = relates_to_position
START WITH relates_to_position IS NULL)
GROUP BY root_position,
root_type;
VALUE POSITION TYPE
---------- ---------- ----------
100 2 1
80 6 2

Hive: window function - how to exclude the CURRENT ROW

I wish to calculate the minimum of a value over a partition, but the current row should not be taken into account.
SELECT *,
MIN(val) OVER(PARTITION BY col1)
FROM table
outputs the minimum over all rows in the partition.
The documentation shows ways to use CURRENT ROW, but not how to exclude it while performing the windowing operation.
I am looking for something like this:
SELECT *,
MIN(val) OVER(PARTITION BY col1 ROWS NOT CURRENT ROW)
FROM table
but this does not work.
I can think of a way to do this. The min over a window excluding the current row will always be the min over the window except when the row you are at is the min; then then min will be the 2nd min over the window. Example:
Data:
-----------
key | val
-----------
1 8
1 2
1 4
1 6
1 11
2 3
2 5
2 7
2 9
Query:
select key, val, act_min, val_arr
, case when act_min=val then val_arr[1] else act_min
end as min_except_for_c_row
from (
select key, val, act_min, sort_array(val_arr) val_arr
from (
select key, val
, min(val) over (partition by key) act_min
, collect_set(val) over (partition by key) val_arr
from db.table ) A
) B
I left all the columns in for illustration. You can modify the query as needed.
Output:
key val act_min val_arr min_except_for_c_row
1 8 2 [2,4,6,8,11] 2
1 2 2 [2,4,6,8,11] 4
1 4 2 [2,4,6,8,11] 2
1 6 2 [2,4,6,8,11] 2
1 11 2 [2,4,6,8,11] 2
2 3 3 [3,5,7,9] 5
2 5 3 [3,5,7,9] 3
2 7 3 [3,5,7,9] 3
2 9 3 [3,5,7,9] 3

Updating column based on another column's value

How do i update table structured like this:
id[pkey] | parent_id | position
1 1
2 1
3 1
4 1
5 1
6 2
7 2
8 2
9 2
10 3
11 3
12 3
...and so on
to achieve this result:
id[pkey] | parent_id | position
1 1 1
2 1 2
3 1 3
4 1 4
5 1 5
6 2 1
7 2 2
8 2 3
9 2 4
10 3 1
11 3 2
12 3 3
...and so on
I was thinking about somehow mixing
SELECT DISTINCT parent_id FROM cats AS t;
with
CREATE SEQUENCE dpos;
UPDATE cats t1 SET position = nextval('dpos') WHERE t.parent_id = t1.parent_id;
DROP SEQUENCE dpos;
although im not really experienced with postgres, and not sure how to use some kind of FOREACH. I appreciate any help
You can get the incremental number using row_number(). The question is how to assign it to a particular row. Here is one method using a join:
update cats
set position = c2.newpos
from (select c2.*, c2.ctid as c_ctid,
row_number() over (partition by c2.parent_id order by NULL) as seqnum
from cats c2
) c2
where cats.parent_id = c2.parent_id and cats.ctid = c2.c_ctid;
Use row_number function
select parent_id,
row_number() over (partition by parent_id order by parent_id) as position_id from table
Try this:
UPDATE table_name set table_name.dataID = v_table_name.rn
FROM
(
SELECT row_number() over (partition by your_primaryKey order by your_primaryKey) AS rn, id
FROM table_name
) AS v_table_name
WHERE v_table_name.your_primaryKey = v_table_name.your_primaryKey;