Grouping the data by aggregational condition - sql

I need to compose a query (one transaction) that will group elements and transform element's data into groups by sum of values with minimum sum value - 10 for example. If a value is equal to or greater of 10 it should be a separate group.
ElementId | GroupId | Value
1 | NULL | 12
2 | NULL | 10
3 | NULL | 9
4 | NULL | 13
5 | NULL | 25
GroupId | Sum
empty
Element with id 3 can be grouped with any other element, but ideally with another value that fewer than 10. Also, I need to update the relationship between elements and groups
After the query execution:
ElementId | GroupId | Value
1 | 1 | 12
2 | 2 | 10
3 | 3 | 9
4 | 3 | 13
5 | 4 | 25
GroupId | Sum
1 | 12
2 | 10
3 | 22 (13 + 9)
4 | 25
Any ideas?

You can use analytical function (for given sample data) as follows:
Select groupid_new as groupid,
Sum(value) as value
From
(Select t.*,
Sum(case when value >= 10 then value end) over (order by elementid) as groupid_new
From your_table t) t
Group by groupid_new

Related

How to get columns when using buckets (width_bucket)

I would like to know which row were moved to a bucket.
SELECT
width_bucket(s.score, sl.mins, sl.maxs, 9) as buckets,
COUNT(*)
FROM scores s
CROSS JOIN scores_limits sl
GROUP BY 1
ORDER BY 1;
My actual return:
buckets | count
---------+-------
1 | 182
2 | 37
3 | 46
4 | 15
5 | 29
7 | 18
8 | 22
10 | 11
| 20
What I expect to return:
SELECT buckets FROM buckets_table [...] WHERE scores.id = 1;
How can I get, for example, the column 'id' of table scores?
I believe you can include the id in an array with array_agg. If I recreate your case with
create table test (id serial, score int);
insert into test(score) values (10),(9),(5),(4),(10),(2),(5),(7),(8),(10);
The data is
id | score
----+-------
1 | 10
2 | 9
3 | 5
4 | 4
5 | 10
6 | 2
7 | 5
8 | 7
9 | 8
10 | 10
(10 rows)
Using the following and aggregating the id with array_agg
SELECT
width_bucket(score, 0, 10, 11) as buckets,
COUNT(*) nr_ids,
array_agg(id) agg_ids
FROM test s
GROUP BY 1
ORDER BY 1;
You get
buckets | nr_ids | agg_ids
---------+--------+----------
3 | 1 | {6}
5 | 1 | {4}
6 | 2 | {3,7}
8 | 1 | {8}
9 | 1 | {9}
10 | 1 | {2}
12 | 3 | {1,5,10}

Select max value from column for every value in other two columns

I'm working on a webapp that tracks tvshows, and I need to get all episodes id's that are season finales, which means, the highest episode number from all seasons, for all tvshows.
This is a simplified version of my "episodes" table.
id tvshow_id season epnum
---|-----------|--------|-------
1 | 1 | 1 | 1
2 | 1 | 1 | 2
3 | 1 | 1 | 3
4 | 1 | 2 | 1
5 | 1 | 2 | 2
6 | 2 | 1 | 1
7 | 2 | 1 | 2
8 | 2 | 1 | 3
9 | 2 | 1 | 4
10 | 2 | 2 | 1
11 | 2 | 2 | 2
The expect output:
id
---|
3 |
5 |
9 |
11 |
I've managed to get this working for the latest season but I can't make it work for all seasons.
I've also tried to take some ideas from this but I can't seem to find a way to add the tvshow_id in there.
I'm using Postgres v10
SELECT Id from
(Select *, Row_number() over (partition by tvshow_id,season order by epnum desc) as ranking from tbl)c
Where ranking=1
You can use the below SQL to get your result, using GROUP BY with sub-subquery as:
select id from tab_x
where (tvshow_id,season,epnum) in (
select tvshow_id,season,max(epnum)
from tab_x
group by tvshow_id,season)
Below is the simple query to get desired result. Below query is also good in performance with help of using distinct on() clause
select
distinct on (tvshow_id,season)
id
from your_table
order by tvshow_id,season ,epnum desc

Window running function except current row

I have a theoretical question, so I'm not interested in alternative solutions. Sorry.
Q: Is it possible to get the window running function values for all previous rows, except current?
For example:
with
t(i,x,y) as (
values
(1,1,1),(2,1,3),(3,1,2),
(4,2,4),(5,2,2),(6,2,8)
)
select
t.*,
sum(y) over (partition by x order by i) - y as sum,
max(y) over (partition by x order by i) as max,
count(*) filter (where y > 2) over (partition by x order by i) as cnt
from
t;
Actual result is
i | x | y | sum | max | cnt
---+---+---+-----+-----+-----
1 | 1 | 1 | 0 | 1 | 0
2 | 1 | 3 | 1 | 3 | 1
3 | 1 | 2 | 4 | 3 | 1
4 | 2 | 4 | 0 | 4 | 1
5 | 2 | 2 | 4 | 4 | 1
6 | 2 | 8 | 6 | 8 | 2
(6 rows)
I want to have max and cnt columns behavior like sum column, so, result should be:
i | x | y | sum | max | cnt
---+---+---+-----+-----+-----
1 | 1 | 1 | 0 | | 0
2 | 1 | 3 | 1 | 1 | 0
3 | 1 | 2 | 4 | 3 | 1
4 | 2 | 4 | 0 | | 0
5 | 2 | 2 | 4 | 4 | 1
6 | 2 | 8 | 6 | 4 | 1
(6 rows)
It can be achieved using simple subquery like
select t.*, lag(y,1) over (partition by x order by i) as yy from t
but is it possible using only window function syntax, without subqueries?
Yes, you can. This does the trick:
with
t(i,x,y) as (
values
(1,1,1),(2,1,3),(3,1,2),
(4,2,4),(5,2,2),(6,2,8)
)
select
t.*,
sum(y) over w as sum,
max(y) over w as max,
count(*) filter (where y > 2) over w as cnt
from t
window w as (partition by x order by i
rows between unbounded preceding and 1 preceding);
The frame_clause selects just those rows from the window frame that you are interested in.
Note that in the sum column you'll get null rather than 0 because of the frame clause: the first row in the frame has no row before it. You can coalesce() this away if needed.
SQLFiddle

Select dynamic couples of lines in SQL (PostgreSQL)

My objective is to make dynamic group of lines (of product by TYPE & COLOR in fact)
I don't know if it's possible just with one select query.
But : I want to create group of lines (A PRODUCT is a TYPE and a COLOR) as per the number_per_group column and I want to do this grouping depending on the date order (Order By DATE)
A single product with a NB_PER_GROUP number 2 is exclude from the final result.
Table :
-----------------------------------------------
NUM | TYPE | COLOR | NB_PER_GROUP | DATE
-----------------------------------------------
0 | 1 | 1 | 2 | ...
1 | 1 | 1 | 2 |
2 | 1 | 2 | 2 |
3 | 1 | 2 | 2 |
4 | 1 | 1 | 2 |
5 | 1 | 1 | 2 |
6 | 4 | 1 | 3 |
7 | 1 | 1 | 2 |
8 | 4 | 1 | 3 |
9 | 4 | 1 | 3 |
10 | 5 | 1 | 2 |
Results :
------------------------
GROUP_NUMBER | NUM |
------------------------
0 | 0 |
0 | 1 |
~~~~~~~~~~~~~~~~~~~~~~~~
1 | 2 |
1 | 3 |
~~~~~~~~~~~~~~~~~~~~~~~~
2 | 4 |
2 | 5 |
~~~~~~~~~~~~~~~~~~~~~~~~
3 | 6 |
3 | 8 |
3 | 9 |
If you have another way to solve this problem, I will accept it.
What about something like this?
select max(gn.group_number) group_number, ip.num
from products ip
join (
select date, type, color, row_number() over (order by date) - 1 group_number
from (
select op.num, op.type, op.color, op.nb_per_group, op.date, (row_number() over (partition by op.type, op.color order by op.date) - 1) % nb_per_group group_order
from products op
) sq
where sq.group_order = 0
) gn
on ip.type = gn.type
and ip.color = gn.color
and ip.date >= gn.date
group by ip.num
order by group_number, ip.num
This may only work if your nb_per_group values are the same for each combination of type and color. It may also require unique dates, but that could probably be worked around if required.
The innermost subquery partitions the rows by type and color, orders them by date, then calculates the row numbers modulo nb_per_group; this forms a 0-based count for the group that resets to 0 each time nb_per_group is exceeded.
The next-level subquery finds all of the 0 values we mapped in the lower subquery and assigns group numbers to them.
Finally, the outermost query ties each row in the products table to a group number, calculated as the highest group number that split off before this product's date.

Can I order by multiple columns and somehow keep the ordering related between columns in MySQL?

I know the title doesn't explain my question very well (if someone can come up with a better title then please edit it). Here's what I want to do, say I have the following table:
id | a | b | c
------------------
1 | 3 | 3 | 3
2 | 20 | 40 | 30
3 | 40 | 30 | 10
4 | 30 | 10 | 15
5 | 10 | 15 | 6
6 | 15 | 6 | 20
This is slightly truncated version, I have a few more columns to sort by, but the principle behind the data & my question is the same.
What I would like is to get the data ordered in the following way:
The row with the highest value in col a
The row with the highest value in col b
The row with the highest value in col c
Followed by all remaining rows ordered by their value in col c
So, the result set would look like:
id | a | b | c
------------------
3 | 40 | 30 | 10
2 | 20 | 40 | 30
6 | 15 | 6 | 20
4 | 30 | 10 | 15
5 | 10 | 15 | 6
1 | 3 | 3 | 3
Doing a
SELECT id, a, b, c
FROM table
ORDER BY a DESC, b DESC, c DESC
Obviously gives me a ordered first, then b and finally c, so the following (which is not what I need):
id | a | b | c
------------------
3 | 40 | 30 | 10
4 | 30 | 10 | 15
2 | 20 | 40 | 30
6 | 15 | 6 | 20
5 | 10 | 15 | 6
1 | 3 | 3 | 3
I'm not familiar with the MySQL TSQL dialect but you would have to first SELECT the row with the highest 'A' value, perform a UNION ALL (i.e. no distinct via sorting) with the row with the highest 'B' value, perform a UNION ALL with the row with the highest 'C' value and then a UNION ALL with the remaining rows ordered by 'C' and excluding the 3 rows (by id) already selected.
I've just tested the following which appears to work (does involve 3 subqueries however):
SELECT id, a, b, c
FROM test
ORDER BY FIELD(a,(SELECT MAX(a) FROM test)) DESC,
FIELD(b,(SELECT MAX(b) FROM test)) DESC,
FIELD(c,(SELECT MAX(c) FROM test)) DESC,
c DESC