Say I have the following table, called revenues.
id | revenue
------------
1 | 345
2 | 5673
3 | 0
4 | 45
5 | 4134
6 | 35
7 | 533
8 | 856
9 | 636
10 | 35
I want to find the largest sum of the grouping of sequential 3 values. Here's what I mean:
ids 1 + 2 + 3 => 345 + 5673 + 0 = 6018
ids 2 + 3 + 4 => 5673 + 0 + 45 = 5718
ids 3 + 4 + 5 => 0 + 45 + 4134 = 4179
ids 4 + 5 + 6 => 45 + 4134 + 35 = 4214
ids 5 + 6 + 7 => 4134 + 35 + 533 = 4702
ids 6 + 7 + 8 => 35 + 533 + 856 = 1424
ids 7 + 8 + 9 => 533 + 856 + 636 = 2025
ids 8 + 9 + 10 => 856 + 636 + 35 = 1527
In this case, I would want the result to be 6018, since it's the largest sum of 3 sequential values. I'm just starting to learn SQL, with my only other previous language being Java, and all I can think is how easy this would be to do with a for loop. Does anyone have any idea on how I could get started writing a query like this? Does a similar thing exist in SQL?
Edit: Furthermore, is it possible to scale something like this? What if I had a really big table and I wanted to find the largest sum of a hundred sequential values?
One approach would be to use two joins to get to id+1 and id+2:
SELECT max(t1.revenue+t2.revenue+t3.revenue)
FROM revenues t1
JOIN revenues t2 ON t1.id+1 = t2.id
JOIN revenues t3 ON t1.id+2 = t3.id
Demo.
If your database supports the lag() window function, you can retrieve the result in a single table scan:
select max(rev3)
from (
select revenue +
lag(revenue) over (order by id) +
lag(revenue, 2) over (order by id) as rev3
from revenues
) as SubQueryAlias
See it working at SQL Fiddle.
with t as (
SELECT 1 as id, 345 as rev
UNION SELECT 2, 5673
UNION SELECT 3, 0
UNION SELECT 4, 45
UNION SELECT 5, 4134
UNION SELECT 6, 35
UNION SELECT 7, 533
UNION SELECT 8, 856
UNION SELECT 9, 636
UNION SELECT 10, 35)
SELECT TOP 1 id, SUM (rev) OVER (ORDER BY id ROWS 2 PRECEDING) r
FROM t
ORDER BY r desc;
Provides answer 3, 6018* on SQL Server 2012.
EDIT
Query that makes sure that we only get rows that are made up from 3 revenues:
with t as (
SELECT 1 as id, 345 as rev
UNION SELECT 2, 5673
UNION SELECT 3, 0
UNION SELECT 4, 45
UNION SELECT 5, 4134
UNION SELECT 6, 35
UNION SELECT 7, 533
UNION SELECT 8, 856
UNION SELECT 9, 636
UNION SELECT 10, 35)
SELECT TOP 1 id, r FROM
(SELECT id
, SUM (rev) OVER (ORDER BY id ROWS 2 PRECEDING) r
, SUM (1) OVER (ORDER BY id ROWS 2 PRECEDING) cnt
FROM t) as subslt
WHERE cnt = 3
ORDER BY r desc;
*Actually non-deterministic between 3, 6018 and 2, 6018. The second/edited query is deterministic.
Something like this:
select rev1,rev2,rev3, rev1.revenue+rev2.revenue+rev3.revenue total_rev from
revenue rev1,
revenue rev2,
revenue rev3
where rev1.id1+1=rev2.id and rev2.id+1=rev3.id and total_rev=
(select max(rev1.revenue+rev2.revenue+rev3.revenue) from
revenue rev1,
revenue rev2,
revenue rev3
where rev1.id1+1=rev2.id and rev2.id+1=rev3.id)
Related
Problem statement is to calculate median from a table that has two columns. One specifying a number and the other column specifying the frequency of the number.
For e.g.
Table "Numbers":
Num
Freq
1
3
2
3
This median needs to be found for the flattened array with values:
1,1,1,2,2,2
Query:
with ct1 as
(select num,frequency, sum(frequency) over(order by num) as sf from numbers o)
select case when count(num) over(order by num) = 1 then num
when count(num) over (order by num) > 1 then sum(num)/2 end median
from ct1 b where sf <= (select max(sf)/2 from ct1) or (sf-frequency) <= (select max(sf)/2 from ct1)
Is it not possible to use count(num) over(order by num) as the condition in the case statement?
Find the relevant row / 2 rows based of the accumulated frequencies, and take the average of num.
The example and Fiddle will also show you the
computations leading to the result.
If you already know that num is unique, rowid can be removed from the ORDER BY clauses
with
t1 as
(
select t.*
,nvl(sum(freq) over (order by num,rowid rows between unbounded preceding and 1 preceding),0) as freq_acc_sum_1
,sum(freq) over (order by num, rowid) as freq_acc_sum_2
,sum(freq) over () as freq_sum
from t
)
select t1.*
,case
when freq_sum/2 between freq_acc_sum_1 and freq_acc_sum_2
then 'V'
end as relevant_record
from t1
order by num, rowid
Fiddle
Example:
ID
NUM
FREQ
FREQ_ACC_SUM_1
FREQ_ACC_SUM_2
FREQ_SUM
RELEVANT_RECORD
7
8
1
0
1
18
5
10
1
1
2
18
1
29
3
2
5
18
6
31
1
5
6
18
3
33
2
6
8
18
4
41
1
8
9
18
V
9
49
2
9
11
18
V
2
52
1
11
12
18
8
56
3
12
15
18
10
92
3
15
18
18
MEDIAN
45
Fiddle for 1M records
You can find the one (or two) middle value(s) and then average:
SELECT AVG(num) AS median
FROM (
SELECT num,
freq,
SUM(freq) OVER (ORDER BY num) AS cum_freq,
(SUM(freq) OVER () + 1)/2 AS median_freq
FROM table_name
)
WHERE cum_freq - freq < median_freq
AND median_freq < cum_freq + 1
Or, expand the values using a LATERAL join to a hierarchical query and then use the MEDIAN function:
SELECT MEDIAN(num) AS median
FROM table_name t
CROSS JOIN LATERAL (
SELECT LEVEL
FROM DUAL
WHERE freq > 0
CONNECT BY LEVEL <= freq
)
Which, for the sample data:
CREATE TABLE table_name (Num, Freq) AS
SELECT 1, 3 FROM DUAL UNION ALL
SELECT 2, 3 FROM DUAL;
Outputs:
MEDIAN
1.5
(Note: For your sample data, there are 6 items, an even number, so the MEDIAN will be half way between the value of 3rd and 4rd items; so half way between 1 and 2 = 1.5.)
db<>fiddle here
I am trying to break up a running (ordered) sum into groups of a max value. When I implement the following example logic...
IF OBJECT_ID(N'tempdb..#t') IS NOT NULL DROP TABLE #t
SELECT TOP (ABS(CHECKSUM(NewId())) % 1000) ROW_NUMBER() OVER (ORDER BY name) AS ID,
LEFT(CAST(NEWID() AS NVARCHAR(100)),ABS(CHECKSUM(NewId())) % 30) AS Description
INTO #t
FROM sys.objects
DECLARE #maxGroupSize INT
SET #maxGroupSize = 100
;WITH t AS (
SELECT
*,
LEN(Description) AS DescriptionLength,
SUM(LEN(Description)) OVER (/*PARTITION BY N/A */ ORDER BY ID) AS [RunningLength],
SUM(LEN(Description)) OVER (/*PARTITION BY N/A */ ORDER BY ID)/#maxGroupSize AS GroupID
FROM #t
)
SELECT *, SUM(DescriptionLength) OVER (PARTITION BY GroupID) AS SumOfGroup
FROM t
ORDER BY GroupID, ID
I am getting groups that are larger than the maximum group size (length) of 100.
A recusive common table expression (rcte) would be one way to resolve this.
Sample data
Limited set of fixed sample data.
create table data
(
id int,
description nvarchar(20)
);
insert into data (id, description) values
( 1, 'qmlsdkjfqmsldk'),
( 2, 'mldskjf'),
( 3, 'qmsdlfkqjsdm'),
( 4, 'fmqlsdkfq'),
( 5, 'qdsfqsdfqq'),
( 6, 'mds'),
( 7, 'qmsldfkqsjdmfqlkj'),
( 8, 'qdmsl'),
( 9, 'mqlskfjqmlkd'),
(10, 'qsdqfdddffd');
Solution
For every recursion step evaluate (r.group_running_length + len(d.description) <= #group_max_length) if the previous group must be extended or a new group must be started in a case expression.
Set group target size to 40 to better fit the sample data.
declare #group_max_length int = 40;
with rcte as
(
select d.id,
d.description,
len(d.description) as description_length,
len(d.description) as running_length,
1 as group_id,
len(d.description) as group_running_length
from data d
where d.id = 1
union all
select d.id,
d.description,
len(d.description),
r.running_length + len(d.description),
case
when r.group_running_length + len(d.description) <= #group_max_length
then r.group_id
else r.group_id + 1
end,
case
when r.group_running_length + len(d.description) <= #group_max_length
then r.group_running_length + len(d.description)
else len(d.description)
end
from rcte r
join data d
on d.id = r.id + 1
)
select r.id,
r.description,
r.description_length,
r.running_length,
r.group_id,
r.group_running_length,
gs.group_sum
from rcte r
cross apply ( select max(r2.group_running_length) as group_sum
from rcte r2
where r2.group_id = r.group_id ) gs -- group sum
order by r.id;
Result
Contains both the running group length as well as the group sum for every row.
id description description_length running_length group_id group_running_length group_sum
-- ---------------- ------------------ -------------- -------- -------------------- ---------
1 qmlsdkjfqmsldk 14 14 1 14 33
2 mldskjf 7 21 1 21 33
3 qmsdlfkqjsdm 12 33 1 33 33
4 fmqlsdkfq 9 42 2 9 39
5 qdsfqsdfqq 10 52 2 19 39
6 mds 3 55 2 22 39
7 qmsldfkqsjdmfqlkj 17 72 2 39 39
8 qdmsl 5 77 3 5 28
9 mqlskfjqmlkd 12 89 3 17 28
10 qsdqfdddffd 11 100 3 28 28
Fiddle to see things in action (includes random data version).
I have a table for subjects as follows:
id Subject Grade Ext
100 Math 6 +
100 Science 4 -
100 Hist 3
100 Geo 2 +
100 CompSi 1
I am expecting output per student in a class(id = 100) as follows:
Grade Ext StudentGrade
6 + 1
6 0
6 - 0
5 + 0
5 0
5 - 0
4 + 0
4 0
4 - 1
3 + 0
3 1
3 - 0
2 + 1
2 0
2 - 0
1 + 0
1 1
1 - 0
I would want this done on oracle/sql rather than UI. Any inputs please.
You should generate rows first, before join them with your table like below. I use the with clause here to generate the 18 rows in your sample.
with rws (grade, ext) as (
select ceil(level/3), decode(mod(level, 3), 0, '+', 1, '-', null)
from dual
connect by level <= 3 * 6
)
select r.grade, r.ext, nvl2(t.Ext, 1, 0) studentGrade
from rws r
left join your_table t
on t.Grade = r.Grade and decode(t.Ext, r.Ext, 1, 0) = 1
order by 1 desc, decode(r.ext, null, 2, '-', 3, '+', 1)
You could do something like this. In the WITH clause I generate two small "helper" tables (really, inline views) for grades from 1 to 6 and for "extensions" of +, null and -. In the "extensions" view I also create an "ordering" column to use in ordering the final output (if you are wondering why I included that).
Also in the WITH clause I included sample data - you will have to remove that and instead use your actual table name in the main query.
The idea is to cross-join "grades" and "extensions", and left-outer-join the result to your input data. Count the grades from the input data, grouped by grade and extension, and after filtering the desired id. The decode thing in the join condition is needed because for extension we want to treat null as equal to null - something that decode does nicely.
with
sample_inputs (id, subject, grade, ext) as (
select 100, 'Math' , 6, '+' from dual union all
select 100, 'Science', 4, '-' from dual union all
select 100, 'Hist' , 3, null from dual union all
select 100, 'Geo' , 2, '+' from dual union all
select 100, 'CompSi' , 1, null from dual
)
, g (grade) as (select level from dual connect by level <= 6)
, e (ord, ext) as (
select 1, '+' from dual union all
select 2, null from dual union all
select 3, '-' from dual
)
select g.grade, e.ext, count(t.grade) as studentgrade
from g cross join e left outer join sample_inputs t
on t.grade = g.grade and decode(t.ext, e.ext, 0) = 0
and t.id = 100 -- change this as needed!
group by g.grade, e.ext, e.ord
order by g.grade desc, e.ord
;
OUTPUT:
GRADE EXT STUDENTGRADE
----- --- ------------
6 + 1
6 0
6 - 0
5 + 0
5 0
5 - 0
4 + 0
4 0
4 - 1
3 + 0
3 1
3 - 0
2 + 1
2 0
2 - 0
1 + 0
1 1
1 - 0
It looks like you want sparse data to be filled in as part of joining students and subjects.
Since Oracle 10g the correct way to do this has been with a "partition outer join".
The documentation has examples.
https://docs.oracle.com/en/database/oracle/oracle-database/21/sqlrf/SELECT.html#GUID-CFA006CA-6FF1-4972-821E-6996142A51C6
I'm wondering if such thing is possible with SQL only. What I'm trying to achieve here is the following:
An SQL table with the following column:
------------
| DURATION |
|----------|
| 5 |
| 14 |
| 3 |
| 25 |
| . |
| . |
| . |
I want to select all possible set of rows satisfying the sum of DURATION from each row being lesser than or greater than a given value. For example if the value is 20 then the result of lesser than 20 should contain 3 sets of rows
14 + 5
5 + 3
14 + 3
Consider a self-join avoiding reverse duplicates with conditions that the two fields are less than zero in sum. NB: This only returns two-pair combinations.
SELECT t1.DURATION, t2.DURATION
FROM myTable t1
LEFT JOIN myTable t2
ON t1.DURATION < t2.DURATION
WHERE t1.DURATION + t2.DURATION < 20
Here's a recursive CTE solution (requiring MySQL 8.0+) for finding all combinations of sums of rows that add up to less than a given value. If you don't have MySQL 8, you will probably need to write a stored procedure to do the same looping.
WITH RECURSIVE cte AS (
SELECT duration,
duration AS total_duration,
CAST(duration AS CHAR(100)) AS duration_list
FROM test
WHERE duration < 20
UNION ALL
SELECT test.duration,
test.duration + cte.total_duration,
CONCAT(cte.duration_list, ' + ', test.duration)
FROM test
JOIN cte ON test.duration > cte.duration AND
test.duration + cte.total_duration < 20)
SELECT duration_list, total_duration
FROM cte
WHERE duration_list != total_duration
ORDER BY total_duration ASC
Sample output for my demo on dbfiddle:
duration_list total_duration
2 + 3 5
2 + 5 7
3 + 5 8
2 + 8 10
2 + 3 + 5 10
3 + 8 11
2 + 11 13
5 + 8 13
2 + 3 + 8 13
3 + 11 14
2 + 5 + 8 15
5 + 11 16
2 + 3 + 11 16
3 + 5 + 8 16
2 + 14 16
3 + 14 17
2 + 5 + 11 18
2 + 3 + 5 + 8 18
2 + 3 + 14 19
3 + 5 + 11 19
8 + 11 19
5 + 14 19
You can solve the problem with using Common Table Expressions. You should have MySQL 8.0.
See below.
WITH cte (duration) AS (
SELECT duration
FROM your_table
WHERE duration < 20
)
SELECT a.duration + b.duration AS 'sum_of_val'
FROM cte a JOIN cte b
WHERE a.duration + b.duration < 20
If you have the other version which dose not support CTE, you can use the Subquery.
See below.
SELECT a.duration + b.duration AS 'sum_of_val'
FROM (select duration from your_table where duration < 20 ) a
JOIN (select duration from your_table where duration <20 ) b
WHERE a.duration + b.duration < 20
I need to find how many records it took to reach a given value. I have a table in the below format:
ID Name Time Time 2
1 Campaign 1 7 100
2 Campaign 3 5 165
3 Campaign 1 3 321
4 Campaign 2 610 952
5 Campaign 2 15 13
6 Campaign 2 310 5
7 Campaign 3 0 3
8 Campaign 1 0 610
9 Campaign 1 1 15
10 Campaign 1 54 310
11 Campaign 3 4 0
12 Campaign 2 23 0
13 Campaign 2 8 1
14 Campaign 3 23 1
15 Campaign 3 7 0
16 Campaign 3 5 5
17 Campaign 3 2 66
18 Campaign 3 100 7
19 Campaign 1 165 3
20 Campaign 1 321 13
21 Campaign 1 952 5
22 Campaign 1 13 3
23 Campaign 2 15 610
24 Campaign 2 0 15
25 Campaign 1 100 310
26 Campaign 2 165 0
27 Campaign 3 321 0
28 Campaign 3 952 1
29 Campaign 3 0 1
30 Campaign 3 5 0
I'd like to find out how many entries of 'Campaign 1' there were before the total of Time1 + Time2 was equal to or greater than a given number.
As an example, the result for Campaign 1 to reach 1400 should be 5.
Apologies if I haven't explained this clearly enough - the concept is still a little muddy at the moment.
Thanks
In SQL Server 2012, you can get the row using:
select t.*
from (select t.*, sum(time1 + time2) over (partition by name order by id) as cumsum
from table t
) t
where cumsum >= #VALUE and (cumsum - (time1 + time2)) < #VALUE;
You can get the count using:
select name, count(*)
from (select t.*, sum(time1 + time2) over (partition by name order by id) as cumsum
from table t
) t
where (cumsum - (time1 + time2)) < #VALUE
group by name;
If you are not using SQL Server 2012, you can do the cumulative sum with a correlated subquery:
select name, count(*)
from (select t.*,
(select sum(time1 + time2)
from table t2
where t2.name = t.name and
t2.id <= t.id
) as cumsum
from table t
) t
where (cumsum - (time1 + time2)) < #VALUE
group by name;
A recursive CTE computing a running total should work:
;WITH CTE AS
(
SELECT id,
name,
SUM([time]+[time 2])
OVER (ORDER BY id ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
AS RunningTotal
FROM Table1
WHERE name = 'Campaign 1'
)
SELECT count(*)+1 AS [Count]
FROM CTE
WHERE RunningTotal < 1400
Note that I added 1 to the count as the query counts the number of rows needed to reach up to, but not including, 1400. Logic dictates that the next row will push the value above 1400.