Average of successive pairs of rows - sql

I have a table like so:
id | value
---+------
1 | 10
2 | 5
3 | 11
4 | 8
5 | 9
6 | 7
The data in this table is really pairs of values, which I need to take the average of, which should result in:
pair_id | pair_avg
--------+---------
1 | 7.5
2 | 9.5
3 | 8
I have got some other information (a pair of flags) which could also help to pair them, though they still have to be in id order. I cannot really change how the data comes to me.
As I'm more used to arrays than SQL, all I can think is that I need to loop through the table and sum the pairs. But this doesn't strike me as very SQL-ish.
Update
In making this minimal example, I have apparently over simplified.
As the table I am working with is the result of several selects, the IDs will not be quite so clean, apologies for not specifying this.
The table looks a lot more like:
id | value
----------
1 | 10
4 | 5
6 | 11
7 | 8
10 | 9
15 | 7
The results will be used to create a second table, I don't care about the index on this new table, it can provide its own, therefore giving the result already indicated above.

If your data is as clean as the question makes it seem: no NULL values, no gaps, pairs have consecutive positive numbers, starting with 1, and assuming id is type integer, it can be as simple as:
SELECT (id+1)/2 AS pair_id, avg(value) AS pair_avg
FROM tbl
GROUP BY 1
ORDER BY 1;
Integer division truncates the result and thus takes care of grouping pairs automatically this way.
If your id numbers are not as regular but at least strictly monotonically increasing like your update suggests (still no NULL or missing values), you can use a surrogate ID generated with row_number() instead:
SELECT id/2 AS pair_id, avg(value) AS pair_avg
FROM (SELECT row_number() OVER (ORDER BY id) + 1 AS id, value FROM tbl) t
GROUP BY 1
ORDER BY 1;
db<>fiddle here

I think you can just use group by with arithmetic:
select row_number() over (order by min(id)), min(id), max(id), avg(id)
from t
group by floor( (id - 1) / 2 );
I'm not sure why you would want to renumber the ids after aggregation. The original ids seem more useful.

You may use ceil function by appliying division by 2 to id column as in the following select statement :
with t(id,value) as
(
select 1 , 10 union all
select 2 , 5 union all
select 3 , 11 union all
select 4 , 8 union all
select 5 , 9 union all
select 6 , 7
)
select ceil(id/2::numeric) as "ID", avg(t.value) as "pair_avg"
from t
group by "ID"
order by "ID";
id | pair_avg
-------------
1 | 7.5
2 | 9.5
3 | 8

Related

Split a quantity into multiple rows with limit on quantity per row

I have a table of ids and quantities that looks like this:
dbo.Quantity
id | qty
-------
1 | 3
2 | 6
I would like to split the quantity column into multiple lines and number them, but with a set limit (which can be arbitrary) on the maximum quantity allowed for each row.
So for the value of 2, expected output should be:
dbo.DesiredResult
id | qty | bucket
---------------
1 | 2 | 1
1 | 1 | 2
2 | 1 | 2
2 | 2 | 3
2 | 2 | 4
2 | 1 | 5
In other words,
Running SELECT id, SUM(qty) as qty FROM dbo.DesiredResult should return the original table (dbo.Quantity).
Running
SELECT id, SUM(qty) as qty FROM dbo.DesiredResult GROUP BY bucket
should give you this table.
id | qty | bucket
------------------
1 | 2 | 1
1 | 2 | 2
2 | 2 | 3
2 | 2 | 4
2 | 1 | 5
I feel I can do this with cursors imperitavely, looping over each row, keeping a counter that increments and resets as the "max" for each is filled. But this is very "anti-SQL" I feel there is a better way around this.
One approach is recursive CTE which emulates cursor sequentially going through rows.
Another approach that comes to mind is to represent your data as intervals and intersections of intervals.
Represent this:
id | qty
-------
1 | 3
2 | 6
as intervals [0;3), [3;9) with ids being their labels
0123456789
|--|-----|
1 2 - id
It is easy to generate this set of intervals using running total SUM() OVER().
Represent your buckets also as intervals [0;2), [2;4), [4;6), etc. with their own labels
0123456789
|-|-|-|-|-|
1 2 3 4 5 - bucket
It is easy to generate this set of intervals using a table of numbers.
Intersect these two sets of intervals preserving information about their labels.
Working with sets should be possible in a set-based SQL query, rather than a sequential cursor or recursion.
It is bit too much for me to write down the actual query right now. But, it is quite possible that ideas similar to those discussed in Packing Intervals by Itzik Ben-Gan may be useful here.
Actually, once you have your quantities represented as intervals you can generate required number of rows/buckets on the fly from the table of numbers using CROSS APPLY.
Imagine we transformed your Quantity table into Intervals:
Start | End | ID
0 | 3 | 1
3 | 9 | 2
And we also have a table of numbers - a table Numbers with column Number with values from 0 to, say, 100K.
For each Start and End of the interval we can calculate the corresponding bucket number by dividing the value by the bucket size and rounding down or up.
Something along these lines:
SELECT
Intervals.ID
,A.qty
,A.Bucket
FROM
Intervals
CROSS APPLY
(
SELECT
Numbers.Number + 1 AS Bucket
,#BucketSize AS qty
-- it is equal to #BucketSize if the bucket is completely within the Start and End boundaries
-- it should be adjusted for the first and last buckets of the interval
FROM Numbers
WHERE
Numbers.Number >= Start / #BucketSize
AND Numbers.Number < End / #BucketSize + 1
) AS A
;
You'll need to check and adjust formulas for errors +-1.
And write some CASE WHEN logic for calculating the correct qty for the buckets that happen to be on the lower and upper boundary of the interval.
Use a recursive CTE:
with cte as (
select id, 1 as n, qty
from t
union all
select id, n + 1, qty
from cte
where n + 1 < qty
)
select id, n
from cte;
Here is a db<>fiddle.

Counting the total number of rows with SELECT DISTINCT ON without using a subquery

I have performing some queries using PostgreSQL SELECT DISTINCT ON syntax. I would like to have the query return the total number of rows alongside with every result row.
Assume I have a table my_table like the following:
CREATE TABLE my_table(
id int,
my_field text,
id_reference bigint
);
I then have a couple of values:
id | my_field | id_reference
----+----------+--------------
1 | a | 1
1 | b | 2
2 | a | 3
2 | c | 4
3 | x | 5
Basically my_table contains some versioned data. The id_reference is a reference to a global version of the database. Every change to the database will increase the global version number and changes will always add new rows to the tables (instead of updating/deleting values) and they will insert the new version number.
My goal is to perform a query that will only retrieve the latest values in the table, alongside with the total number of rows.
For example, in the above case I would like to retrieve the following output:
| total | id | my_field | id_reference |
+-------+----+----------+--------------+
| 3 | 1 | b | 2 |
+-------+----+----------+--------------+
| 3 | 2 | c | 4 |
+-------+----+----------+--------------+
| 3 | 3 | x | 5 |
+-------+----+----------+--------------+
My attemp is the following:
select distinct on (id)
count(*) over () as total,
*
from my_table
order by id, id_reference desc
This returns almost the correct output, except that total is the number of rows in my_table instead of being the number of rows of the resulting query:
total | id | my_field | id_reference
-------+----+----------+--------------
5 | 1 | b | 2
5 | 2 | c | 4
5 | 3 | x | 5
(3 rows)
As you can see it has 5 instead of the expected 3.
I can fix this by using a subquery and count as an aggregate function:
with my_values as (
select distinct on (id)
*
from my_table
order by id, id_reference desc
)
select count(*) over (), * from my_values
Which produces my expected output.
My question: is there a way to avoid using this subquery and have something similar to count(*) over () return the result I want?
You are looking at my_table 3 ways:
to find the latest id_reference for each id
to find my_field for the latest id_reference for each id
to count the distinct number of ids in the table
I therefore prefer this solution:
select
c.id_count as total,
a.id,
a.my_field,
b.max_id_reference
from
my_table a
join
(
select
id,
max(id_reference) as max_id_reference
from
my_table
group by
id
) b
on
a.id = b.id and
a.id_reference = b.max_id_reference
join
(
select
count(distinct id) as id_count
from
my_table
) c
on true;
This is a bit longer (especially the long thin way I write SQL) but it makes it clear what is happening. If you come back to it in a few months time (somebody usually does) then it will take less time to understand what is going on.
The "on true" at the end is a deliberate cartesian product because there can only ever be exactly one result from the subquery "c" and you do want a cartesian product with that.
There is nothing necessarily wrong with subqueries.

Return only first row with particular value in a column

I realize that this has probably been asked a billion times, and I could swear I've done this in the past, but tonight I've got brain block or something and can't figure it out...
I have a database table ("t1") where I need to be able to retrieve only the first row where a particular value appears in a particular column.
Here's a sample of the data:
id | qID | Name
---------------------
1 | 1 | Bob
2 | 3 | Fred
3 | 1 | George
4 | 1 | Jack
What I want as a result is:
id | qID | Name
---------------------
1 | 1 | Bob
2 | 3 | Fred
The only column I actually need to get out of the query is the first one, but that's not where the duplicates need to be eliminated, and I thought it might be confusing not to show the entire row.
I've tried using this:
select id, qID, ROW_NUMBER() over(partition by qID order by qID) as zxy
from t1 where zxy = 1
But it gives me this error:
Msg 207, Level 16, State 1, Line 14
Invalid column name 'zxy'.
If I remove the where part of the query, the rest of it works fine. I've tried different variable names, using single or double quotes around 'zxy' but it seems to make no difference. And try as I might, I can't find the part of the SQL Server documentation where it discusses assigning a variable name to an expression, as in the "as zxy" part of the above query... if anybody has a link for that, that's quite useful.
Needless to say, I've tried other variable names besides "zxy" but that makes no difference.
Help!
WHERE clause is applied earlier in the process than SELECT. Therefore the calculated column zxy is not available in WHERE. In order to achieve your goal you need to put your original query in a subquery or CTE.
select id, qid
from
(
select id, qID, ROW_NUMBER() over(partition by qID order by qID) as zxy
from t1
) q
where zxy = 1
Output:
| id | qid |
|----|-----|
| 1 | 1 |
| 2 | 3 |
Here is SQLFiddle demo
Logical Processing Order of the SELECT statement
1 FROM
2 ON
3 JOIN
4 WHERE
5 GROUP BY
6 WITH CUBE or WITH ROLLUP
7 HAVING
8 SELECT
9 DISTINCT
10 ORDER BY
11 TOP
Where Clause Execute Before Select Clause so You can not find ZXY in Where cluase
with cte as
(
select id, qID, ROW_NUMBER() over(partition by qID order by qID) as zxy
from t1
)
select * from cte where zxy = 1
Here is my blog it might help you http://sqlearth.blogspot.in/2015/05/how-sql-select-statement-logically-works.html

Get last element of an ordered set in postgresql

I am trying to get the last element of an ordered set, stored in a database table. The ordering is defined by one of the columns in the table. Also the table contains multiple sets, so I want the last one for each of the sets.
As an example consider the following table:
benchmarks=# select id,sorter from aggtest ;
id | sorter
----+--------
1 | 1
3 | 1
5 | 1
2 | 2
7 | 2
4 | 1
6 | 2
(7 rows)
Sorter 1 and 2 define each of the sets, sets are ordered by the id column. To get the last element of each set, I defined an aggregate function:
CREATE FUNCTION public.last_agg ( anyelement, anyelement )
RETURNS anyelement LANGUAGE sql IMMUTABLE STRICT AS $$
SELECT $2;
$$;
CREATE AGGREGATE public.last (
sfunc = public.last_agg,
basetype = anyelement,
stype = anyelement
);
As explained here.
However when I use this I get:
benchmarks=# select last(id),sorter from aggtest group by sorter order by sorter;
last | sorter
------+--------
4 | 1
6 | 2
(2 rows)
However, I want to get (5,1) and (7,2) as these are the last ids (numerically) in the set. Looking at how the aggregate mechanism works, I can see quite well, why the result is not what I want. The items are returned in the order I added them, and then aggregated so that the last one I added is returned.
I tried sorting by ids, so that each group is sorted independently, however that does not work:
benchmarks=# select last(id),sorter from aggtest group by sorter order by sorter,id;
ERROR: column "aggtest.id" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: ...(id),sorter from aggtest group by sorter order by sorter,id;
If I wrap the sorting criteria in another aggregate, I get wrong data again:
benchmarks=# select last(id),sorter from aggtest group by sorter order by sorter,last(id);
last | sorter
------+--------
4 | 1
6 | 2
(2 rows)
Also grouping by id in addition to sorter does not work obviously.
Of course there is an easier way, to get the last (highest) id for each group by using the max aggregate. However, I am not so much interested in the id but as in data associated with it (i.e. in the same row). Hence I do not to sort by id and then aggregate so that the row with the highest id is returned for each group.
What is the best way to accomplish this?
EDIT: Why does max(id) grouped by sorter not work
Assume the following complete table (unsorter represents the additional data I have in the table):
benchmarks=# select * from aggtest ;
id | sorter | unsorter
----+--------+----------
1 | 1 | 1
3 | 1 | 2
5 | 1 | 3
2 | 2 | 4
7 | 2 | 5
4 | 1 | 6
6 | 2 | 7
(7 rows)
I would like to retrieve the lines:
id | sorter | unsorter
----+--------+----------
5 | 1 | 3
7 | 2 | 5
However with max(id) and grouping by sorter I get:
benchmarks=# select max(id),sorter,unsorter from aggtest group by sorter;
ERROR: column "aggtest.unsorter" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: select max(id),sorter,unsorter from aggtest group by sorter;
Using a max(unsorter) obviously does not work either:
benchmarks=# select max(id),sorter,max(unsorter) from aggtest group by sorter;
max | sorter | max
-----+--------+-----
5 | 1 | 6
7 | 2 | 7
(2 rows)
However using distinct (the accepted answer) I get:
benchmarks=# select distinct on (sorter) id,sorter,unsorter from aggtest order by sorter, id desc;
id | sorter | unsorter
----+--------+----------
5 | 1 | 3
7 | 2 | 5
(2 rows)
Which has the correct additional data. The join approach also seems to work, by is slightly slower on the test data.
Why not use a window function:
select id, sorter
from (
select id, sorter,
row_number() over (partition by sorter order by id desc) as rn
from aggtest
) t
where rn = 1;
Or using Postgres distinct on operator which is usually faster:
select distinct on (sorter) id, sorter
from aggtest
order by sorter, id desc
You write:
Of course there is an easier way, to get the last (highest) id for
each group by using the max aggregate. However, I am not so much
interested in the id but as in data associated with it (i.e. in the
same row).
This query will give you the data associated with the highest id of each sorter group.
select a.* from aggtest a
join (
select max(id) max_id, sorter
from aggtest
group by sorter
) b on a.id = b.max_id and a.sorter = b.sorter
select distinct max(id) over (partition by sorter) id,sorter
from aggtest order by 2 asc
returns:
5;1
7;2

Create view with multiple rows for one source row

I have a table which has an offset and a qty column.
I now want to create a view from that which has an entry for each precise position.
Table:
offset | qty | more_data
-------+---------+-------------
1 | 3 | 'qwer'
2 | 2 | 'asdf'
View:
position | more_data
---------+------------
1 | 'quer'
2 | 'quer'
3 | 'quer'
2 | 'asdf'
3 | 'asdf'
Is that even possible?
I would need to do that for Oracle (8! - 11), MS SQL (2005-) and PostgreSQL (8-)
Based on you input/output:
with t(offset, qty) as (
select 1, 3 from dual
)
select offset + level - 1 position
from t
connect by rownum <= qty
POSITION
--------
1
2
3
For Postgres:
select offst, generate_series(offst, qty) as position
from the_table
order by offst, num;
SQLFiddle: http://sqlfiddle.com/#!10/e70d9/4
I don't have anything as ancient as 8.0, 8.1 or 8.2 around but it should work on those pre-historic versions as well.
Note that offset is a reserved word in Postgres. You should find a different name for that column
In Oracle, to answer the specific question (i.e. a table with just the one row):
select rn posn from (
select offset-1+rownum rn from the_table
connect by level between offset and qty
);
In reality, your table will have multiple rows, so you will need to restrict the inner query to 1 object row, otherwise I think you will get huge, incorrect output. If you can provide more details about the table/data a more complete answer could be given.