Problem statement is to calculate median from a table that has two columns. One specifying a number and the other column specifying the frequency of the number.
For e.g.
Table "Numbers":
Num
Freq
1
3
2
3
This median needs to be found for the flattened array with values:
1,1,1,2,2,2
Query:
with ct1 as
(select num,frequency, sum(frequency) over(order by num) as sf from numbers o)
select case when count(num) over(order by num) = 1 then num
when count(num) over (order by num) > 1 then sum(num)/2 end median
from ct1 b where sf <= (select max(sf)/2 from ct1) or (sf-frequency) <= (select max(sf)/2 from ct1)
Is it not possible to use count(num) over(order by num) as the condition in the case statement?
Find the relevant row / 2 rows based of the accumulated frequencies, and take the average of num.
The example and Fiddle will also show you the
computations leading to the result.
If you already know that num is unique, rowid can be removed from the ORDER BY clauses
with
t1 as
(
select t.*
,nvl(sum(freq) over (order by num,rowid rows between unbounded preceding and 1 preceding),0) as freq_acc_sum_1
,sum(freq) over (order by num, rowid) as freq_acc_sum_2
,sum(freq) over () as freq_sum
from t
)
select t1.*
,case
when freq_sum/2 between freq_acc_sum_1 and freq_acc_sum_2
then 'V'
end as relevant_record
from t1
order by num, rowid
Fiddle
Example:
ID
NUM
FREQ
FREQ_ACC_SUM_1
FREQ_ACC_SUM_2
FREQ_SUM
RELEVANT_RECORD
7
8
1
0
1
18
5
10
1
1
2
18
1
29
3
2
5
18
6
31
1
5
6
18
3
33
2
6
8
18
4
41
1
8
9
18
V
9
49
2
9
11
18
V
2
52
1
11
12
18
8
56
3
12
15
18
10
92
3
15
18
18
MEDIAN
45
Fiddle for 1M records
You can find the one (or two) middle value(s) and then average:
SELECT AVG(num) AS median
FROM (
SELECT num,
freq,
SUM(freq) OVER (ORDER BY num) AS cum_freq,
(SUM(freq) OVER () + 1)/2 AS median_freq
FROM table_name
)
WHERE cum_freq - freq < median_freq
AND median_freq < cum_freq + 1
Or, expand the values using a LATERAL join to a hierarchical query and then use the MEDIAN function:
SELECT MEDIAN(num) AS median
FROM table_name t
CROSS JOIN LATERAL (
SELECT LEVEL
FROM DUAL
WHERE freq > 0
CONNECT BY LEVEL <= freq
)
Which, for the sample data:
CREATE TABLE table_name (Num, Freq) AS
SELECT 1, 3 FROM DUAL UNION ALL
SELECT 2, 3 FROM DUAL;
Outputs:
MEDIAN
1.5
(Note: For your sample data, there are 6 items, an even number, so the MEDIAN will be half way between the value of 3rd and 4rd items; so half way between 1 and 2 = 1.5.)
db<>fiddle here
Suppose I have a table like
id
1
3
4
10
12
19
and I'd like to group the ids (in sorted order) into the same group if they differ by 5 or less, and a new group if they differ by 6 or more. So the output would be:
id
group
1
1
3
1
4
1
10
2
12
2
19
3
Is this possible in SQL? It will be a query in Trino, and I see they have commands like lag and partition. Has anyone made a query like this that can help out?
You can use a cte with lead:
with cte(id, l1) as (
select t.id, abs(coalesce(lead(t.id) over (order by t.id), 0) - t.id) < 6 from tbl t
)
select c.id, (select sum(c1.id < c.id and c1.l1 = 0) from cte c1) + 1 from cte c
Input table
Col
1
2
2
3
3
4
5
Output
1
2
2
2
2
3
3
3
3
3
3
4
4
4
4
5
5
5
5
5
Is there any way this could be achieved in sql with and without writing any function ? Please take a note that, column can have duplicate values.
you didn't state your DBMS product, but in Postgres this can be done using `generate_series()
select t.col
from the_table t
cross join generate_series(1, t.col)
order by t.col
This is just a cross join to a numbers/tally table. You already have one in your source data, so you can simply use it as a distinct list:
select i.col
from input i
cross join (select distinct col as n from input) n
where n.n <= i.col
order by i.col
Here is a standard SQL recursive query for this:
with cte (num, cnt) as
(
select num, num from mytable
union all
select num, cnt - 1 from cte where cnt > 1
)
select num from cte order by num;
have this values in a table column select a from tab:
a
1
2
3
4
5
6
7
15
16
18
Using a variable=3, how can create column b starting with min(a) and with the following values:
a
b
1
1
2
1
3
1
4
4
5
4
6
4
7
7
15
15
17
15
18
18
something like: for each a (ordered) maintain the value at most for 3, otherwise reset.
Thanks,
AAWNSD
I think you want window functions and groups of three based on arithmetic on a:
select a,
min(a) over (partition by ceiling(a / 3.0)) as b
from tab;
Here is a db<>fiddle.
Hmmm . . . I realize that the above returns "16" for the last row rather than 18. My above interpretation may not be correct. You may be saying that you want groups -- once they start -- to never exceed the group starting value plus 2.
If so, one approach is a recursive CTE:
with recursive tt as (
select a, row_number() over (order by a) as seqnum
from tab
),
cte as (
select a, seqnum, a as grp
from tt
where seqnum = 1
union all
select tt.a, tt.seqnum,
(case when tt.a <= grp + 2 then grp else tt.a end)
from cte join
tt
on tt.seqnum = cte.seqnum + 1
)
select *
from cte;
I have a table of items, each of which has a date associated with it. If I have the date associated with one item, how do I query the database with SQL to get the 'previous' and 'subsequent' items in the table?
It is not possible to simply add (or subtract) a value, as the dates do not have a regular gap between them.
One possible application would be 'previous/next' links in a photo album or blog web application, where the underlying data is in a SQL table.
I think there are two possible cases:
Firstly where each date is unique:
Sample data:
1,3,8,19,67,45
What query (or queries) would give 3 and 19 when supplied 8 as the parameter? (or the rows 3,8,19). Note that there are not always three rows to be returned - at the ends of the sequence one would be missing.
Secondly, if there is a separate unique key to order the elements by, what is the query to return the set 'surrounding' a date? The order expected is by date then key.
Sample data:
(key:date) 1:1,2:3,3:8,4:8,5:19,10:19,11:67,15:45,16:8
What query for '8' returns the set:
2:3,3:8,4:8,16:8,5:19
or what query generates the table:
key date prev-key next-key
1 1 null 2
2 3 1 3
3 8 2 4
4 8 3 16
5 19 16 10
10 19 5 11
11 67 10 15
15 45 11 null
16 8 4 5
The table order is not important - just the next-key and prev-key fields.
Both TheSoftwareJedi and Cade Roux have solutions that work for the data sets I posted last night. For the second question, both seem to fail for this dataset:
(key:date) 1:1,2:3,3:8,4:8,5:19,10:19,11:67,15:45,16:8
The order expected is by date then key, so one expected result might be:
2:3,3:8,4:8,16:8,5:19
and another:
key date prev-key next-key
1 1 null 2
2 3 1 3
3 8 2 4
4 8 3 16
5 19 16 10
10 19 5 11
11 67 10 15
15 45 11 null
16 8 4 5
The table order is not important - just the next-key and prev-key fields.
Select max(element) From Data Where Element < 8
Union
Select min(element) From Data Where Element > 8
But generally it is more usefull to think of sql for set oriented operations rather than iterative operation.
Self-joins.
For the table:
/*
CREATE TABLE [dbo].[stackoverflow_203302](
[val] [int] NOT NULL
) ON [PRIMARY]
*/
With parameter #val
SELECT cur.val, MAX(prv.val) AS prv_val, MIN(nxt.val) AS nxt_val
FROM stackoverflow_203302 AS cur
LEFT JOIN stackoverflow_203302 AS prv
ON cur.val > prv.val
LEFT JOIN stackoverflow_203302 AS nxt
ON cur.val < nxt.val
WHERE cur.val = #val
GROUP BY cur.val
You could make this a stored procedure with output parameters or just join this as a correlated subquery to the data you are pulling.
Without the parameter, for your data the result would be:
val prv_val nxt_val
----------- ----------- -----------
1 NULL 3
3 1 8
8 3 19
19 8 45
45 19 67
67 45 NULL
For the modified example, you use this as a correlated subquery:
/*
CREATE TABLE [dbo].[stackoverflow_203302](
[ky] [int] NOT NULL,
[val] [int] NOT NULL,
CONSTRAINT [PK_stackoverflow_203302] PRIMARY KEY CLUSTERED (
[ky] ASC
)
)
*/
SELECT cur.ky AS cur_ky
,cur.val AS cur_val
,prv.ky AS prv_ky
,prv.val AS prv_val
,nxt.ky AS nxt_ky
,nxt.val as nxt_val
FROM (
SELECT cur.ky, MAX(prv.ky) AS prv_ky, MIN(nxt.ky) AS nxt_ky
FROM stackoverflow_203302 AS cur
LEFT JOIN stackoverflow_203302 AS prv
ON cur.ky > prv.ky
LEFT JOIN stackoverflow_203302 AS nxt
ON cur.ky < nxt.ky
GROUP BY cur.ky
) AS ordering
INNER JOIN stackoverflow_203302 as cur
ON cur.ky = ordering.ky
LEFT JOIN stackoverflow_203302 as prv
ON prv.ky = ordering.prv_ky
LEFT JOIN stackoverflow_203302 as nxt
ON nxt.ky = ordering.nxt_ky
With the output as expected:
cur_ky cur_val prv_ky prv_val nxt_ky nxt_val
----------- ----------- ----------- ----------- ----------- -----------
1 1 NULL NULL 2 3
2 3 1 1 3 8
3 8 2 3 4 19
4 19 3 8 5 67
5 67 4 19 6 45
6 45 5 67 NULL NULL
In SQL Server, I prefer to make the subquery a Common table Expression. This makes the code seem more linear, less nested and easier to follow if there are a lot of nestings (also, less repetition is required on some re-joins).
Firstly, this should work (the ORDER BY is important):
select min(a)
from theTable
where a > 8
select max(a)
from theTable
where a < 8
For the second question that I begged you to ask...:
select *
from theTable
where date = 8
union all
select *
from theTable
where key = (select min(key)
from theTable
where key > (select max(key)
from theTable
where date = 8)
)
union all
select *
from theTable
where key = (select max(key)
from theTable
where key < (select min(key)
from theTable
where date = 8)
)
order by key
SELECT 'next' AS direction, MIN(date_field) AS date_key
FROM table_name
WHERE date_field > current_date
GROUP BY 1 -- necessity for group by varies from DBMS to DBMS in this context
UNION
SELECT 'prev' AS direction, MAX(date_field) AS date_key
FROM table_name
WHERE date_field < current_date
GROUP BY 1
ORDER BY 1 DESC;
Produces:
direction date_key
--------- --------
prev 3
next 19
My own attempt at the set solution, based on TheSoftwareJedi.
First question:
select date from test where date = 8
union all
select max(date) from test where date < 8
union all
select min(date) from test where date > 8
order by date;
Second question:
While debugging this, I used the data set:
(key:date) 1:1,2:3,3:8,4:8,5:19,10:19,11:67,15:45,16:8,17:3,18:1
to give this result:
select * from test2 where date = 8
union all
select * from (select * from test2
where date = (select max(date) from test2
where date < 8))
where key = (select max(key) from test2
where date = (select max(date) from test2
where date < 8))
union all
select * from (select * from test2
where date = (select min(date) from test2
where date > 8))
where key = (select min(key) from test2
where date = (select min(date) from test2
where date > 8))
order by date,key;
In both cases the final order by clause is strictly speaking optional.
If your RDBMS supports LAG and LEAD, this is straightforward (Oracle, PostgreSQL, SQL Server 2012)
These allow to choose the row either side of any given row in a single query
Try this...
SELECT TOP 3 * FROM YourTable
WHERE Col >= (SELECT MAX(Col) FROM YourTable b WHERE Col < #Parameter)
ORDER BY Col