Most Efficient Way to Compute Running Value in SQL [duplicate] - sql

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Calculate a Running Total in SqlServer
Consider this data
Day | OrderCount
1 3
2 2
3 11
4 3
5 6
How can i get this accumulation of OrderCount(running value) resultset using T-SQL query
Day | OrderCount | OrderCountRunningValue
1 3 3
2 2 5
3 11 16
4 3 19
5 6 25
I Can easily do this with looping in the actual query (using #table) or in my C# codebehind but its so slow (Considering that i also get the orders per day) when im processing thousand of records so i'm looking for better / more efficient approach hopefully without loops something like recursing CTE or something else.
Any idea would be greatly appreciated. TIA

As you seem to need these results in the client rather than for use within another SQL query, you are probably better off Not doing this in SQL.
(The linked question in my comment shows 'the best' option within SQL, if that is infact necessary.)
What may be recommended is to pull the Day and OrderCount values as one result set (SELECT day, orderCount FROM yourTable ORDER BY day) and then calculate the running total in your C#.
Your C# code will be able to iterate through the dataset efficiently, and will almost certainly outperform the SQL approaches. What this does do, is to transfer some load from the SQL Server to the web-server, but at an overall (and significant) resource saving.

SELECT t.Day,
t.OrderCount,
(SELECT SUM(t1.OrderCount) FROM table t1 WHERE t1.Day <= t.Day)
AS OrderCountRunningValue
FROM table t

SELECT
t.day,
t.orderCount,
SUM(t1.orderCount) orderCountRunningValue
FROM
table t INNER JOIN table t1 ON t1.day <= t.day
group by t.day,t.orderCount

CTE's to the rescue (again):
DROP TABLE tmp.sums;
CREATE TABLE tmp.sums
( id INTEGER NOT NULL
, zdate timestamp not null
, amount integer NOT NULL
);
INSERT INTO tmp.sums (id,zdate,amount) VALUES
(1, '2011-10-24', 1 ),(1, '2011-10-25', 2 ),(1, '2011-10-26', 3 )
,(2, '2011-10-24', 11 ),(2, '2011-10-25', 12 ),(2, '2011-10-26', 13 )
;
WITH RECURSIVE list AS (
-- Terminal part
SELECT t0.id, t0.zdate
, t0.amount AS amount
, t0.amount AS runsum
FROM tmp.sums t0
WHERE NOT EXISTS (
SELECT * FROM tmp.sums px
WHERE px.id = t0.id
AND px.zdate < t0.zdate
)
UNION
-- Recursive part
SELECT p1.id AS id
, p1.zdate AS zdate
, p1.amount AS amount
, p0.runsum + p1.amount AS runsum
FROM tmp.sums AS p1
, list AS p0
WHERE p1.id = p0.id
AND p0.zdate < p1.zdate
AND NOT EXISTS (
SELECT * FROM tmp.sums px
WHERE px.id = p1.id
AND px.zdate < p1.zdate
AND px.zdate > p0.zdate
)
)
SELECT * FROM list
ORDER BY id, zdate;
The output:
DROP TABLE
CREATE TABLE
INSERT 0 6
id | zdate | amount | runsum
----+---------------------+--------+--------
1 | 2011-10-24 00:00:00 | 1 | 1
1 | 2011-10-25 00:00:00 | 2 | 3
1 | 2011-10-26 00:00:00 | 3 | 6
2 | 2011-10-24 00:00:00 | 11 | 11
2 | 2011-10-25 00:00:00 | 12 | 23
2 | 2011-10-26 00:00:00 | 13 | 36
(6 rows)

Related

How to aggregate based on various conditions

lets say I have a table which stores itemID, Date and total_shipped over a period of time:
ItemID | Date | Total_shipped
__________________________________
1 | 1/20/2000 | 2
2 | 1/20/2000 | 3
1 | 1/21/2000 | 5
2 | 1/21/2000 | 4
1 | 1/22/2000 | 1
2 | 1/22/2000 | 7
1 | 1/23/2000 | 5
2 | 1/23/2000 | 6
Now I want to aggregate based on several periods of time. For example, I Want to know how many of each item was shipped every two days and in total. So the desired output should look something like:
ItemID | Jan20-Jan21 | Jan22-Jan23 | Jan20-Jan23
_____________________________________________
1 | 7 | 6 | 13
2 | 7 | 13 | 20
How do I do that in the most efficient way
I know I can make three different subqueries but I think there should be a better way. My real data is large and there are several different time periods to be considered i. e. in my real problem I want the shipped items for current_week, last_week, two_weeks_ago, three_weeks_ago, last_month, two_months_ago, three_months_ago so I do not think writing 7 different subqueries would be a good idea.
Here is the general idea of what I can already run but is very expensive for the database
WITH
sq1 as (
SELECT ItemID, sum(Total_shipped) sum1
FROM table
WHERE Date BETWEEN '1/20/2000' and '1/21/2000'
GROUP BY ItemID),
sq2 as (
SELECT ItemID, sum(Total_Shipped) sum2
FROM table
WHERE Date BETWEEN '1/22/2000' and '1/23/2000'
GROUP BY ItemID),
sq3 as(
SELECT ItemID, sum(Total_Shipped) sum3
FROM Table
GROUP BY ItemID)
SELECT ItemID, sq1.sum1, sq2.sum2, sq3.sum3
FROM Table
JOIN sq1 on Table.ItemID = sq1.ItemID
JOIN sq2 on Table.ItemID = sq2.ItemID
JOIN sq3 on Table.ItemID = sq3.ItemID
I dont know why you have tagged this question with multiple database.
Anyway, you can use conditional aggregation as following in oracle:
select
item_id,
sum(case when "date" between date'2000-01-20' and date'2000-01-21' then total_shipped end) as "Jan20-Jan21",
sum(case when "date" between date'2000-01-22' and date'2000-01-23' then total_shipped end) as "Jan22-Jan23",
sum(case when "date" between date'2000-01-20' and date'2000-01-23' then total_shipped end) as "Jan20-Jan23"
from my_table
group by item_id
Cheers!!
Use FILTER:
select
item_id,
sum(total_shipped) filter (where date between '2000-01-20' and '2000-01-21') as "Jan20-Jan21",
sum(total_shipped) filter (where date between '2000-01-22' and '2000-01-23') as "Jan22-Jan23",
sum(total_shipped) filter (where date between '2000-01-20' and '2000-01-23') as "Jan20-Jan23"
from my_table
group by 1
item_id | Jan20-Jan21 | Jan22-Jan23 | Jan20-Jan23
---------+-------------+-------------+-------------
1 | 7 | 6 | 13
2 | 7 | 13 | 20
(2 rows)
Db<>fiddle.

Querying across months and days

My access logs database stores time as epoch and extracts year month and day as integers. Further, the partitioning of the database is based on the extracted Y/m/d and I have a 35 day retention.
If I run this query:
select *
from mydb
where year in (2017, 2018)
and month in (12, 1)
and day in (31, 1)
On the 29th of January, 2018, I will get data for 12/31/2017 and 1/1/2018.
On the 5th of January, 2018, I will get data for 12/1/2017, 12/31/2017, and 1/1/2018 (undesirable)
I also realize that I can do something like this:
select *
from mydb
where (year = 2017 and month = 12 and day = 31)
or (year = 2018 and month = 1 and day = 1)
But what I am really looking for is this: a good way to write a query where I give the year month and day number as the start and then a fourth value (number of days +) and then get all the data for 12/31/2017 + 5 days for example.
Is there a native way in SQL to accomplish this? I have an enormous data set and if I don't specify the days and have to rely on the epoch to do this, the query takes forever. I also have no influence over the partitioning configuration.
With Impala as the dbms and SQL dialect you will be able to use common table expressions but not recursion. In addition there may be problems inserting parameters as well.
Below is an untested suggestion that will require you to locate some function alternatives. First it generates a set of rows with an integer from 0 to 999 (in the example). It is quite easy to expand the number of rows if required. From those rows it is possible to add the number of days to a timestamp literal using date_add(timestamp startdate, int days/interval expression) and then with year(timestamp date) and month(timestamp date) and day(timestamp date) see Date and Time functions create the columns needed to match to your data.
Overall then you should be able to build a common table expression that has columns for year, month, day that cover a wanted range, and that you can inner join to your source table and thereby implementing a date range filter.
The code below was produced using T-SQL (SQL Server) and it can be tested here.
-- produce a set of integers, adjust to suit needed number of these
;WITH
cteDigits AS (
SELECT 0 AS digit UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL
SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9
)
, cteTally AS (
SELECT
d1s.digit
+ d10s.digit * 10
+ d100s.digit * 100 /* add more like this as needed */
-- + d1000s.digit * 1000 /* add more like this as needed */
AS num
FROM cteDigits d1s
CROSS JOIN cteDigits d10s
CROSS JOIN cteDigits d100s /* add more like this as needed */
-- CROSS JOIN cteDigits d1000s /* add more like this as needed */
)
, DateRange AS (
select
num
, dateadd(day,num,'20181227') dt
, year(dateadd(day,num,'20181227')) yr
, month(dateadd(day,num,'20181227')) mn
, day(dateadd(day,num,'20181227')) dy
from cteTally
where num < 10
)
select
*
from DateRange
I think these are the Impala equivalents for the function calls used above:
, DateRange AS (
select
num
, date_add(to_timestamp('20181227','yyyyMMdd'),num) dt
, year( date_add(to_timestamp('20181227','yyyyMMdd'),num) ) yr
, month( date_add(to_timestamp('20181227','yyyyMMdd'),num) ) mn
, day( date_add(to_timestamp('20181227','yyyyMMdd'),num) ) dy
from cteTally
where num < 10
Hopefully you can work out how to use these. Ultimately the purpose is to use the generated date range like so:
select * from mydb t
inner join DateRange on t.year = DateRange.yr and t.month = DateRange.mn and t.day = DateRange.dy
original post
Well in the absence of knowing what database to propose solutions for, here is a suggestion using SQL Server:
This suggestion involves a recursive common table expression, which may then be used as an inner join to your source data to limit the results to a date range.
--Sql Server 2014 Express Edition
--https://rextester.com/l/sql_server_online_compiler
declare #yr as integer = 2018
declare #mn as integer = 12
declare #dy as integer = 27
declare #du as integer = 10
;with CTE as (
select
datefromparts(#yr, #mn, #dy) as dt
, #yr as yr
, #mn as mn
, #dy as dy
union all
select
dateadd(dd,1,cte.dt)
, datepart(year,dateadd(dd,1,cte.dt))
, datepart(month,dateadd(dd,1,cte.dt))
, datepart(day,dateadd(dd,1,cte.dt))
from cte
where cte.dt < dateadd(dd,#du-1,datefromparts(#yr, #mn, #dy))
)
select
*
from cte
This produces the following result:
+----+---------------------+------+----+----+
| | dt | yr | mn | dy |
+----+---------------------+------+----+----+
| 1 | 27.12.2018 00:00:00 | 2018 | 12 | 27 |
| 2 | 28.12.2018 00:00:00 | 2018 | 12 | 28 |
| 3 | 29.12.2018 00:00:00 | 2018 | 12 | 29 |
| 4 | 30.12.2018 00:00:00 | 2018 | 12 | 30 |
| 5 | 31.12.2018 00:00:00 | 2018 | 12 | 31 |
| 6 | 01.01.2019 00:00:00 | 2019 | 1 | 1 |
| 7 | 02.01.2019 00:00:00 | 2019 | 1 | 2 |
| 8 | 03.01.2019 00:00:00 | 2019 | 1 | 3 |
| 9 | 04.01.2019 00:00:00 | 2019 | 1 | 4 |
| 10 | 05.01.2019 00:00:00 | 2019 | 1 | 5 |
+----+---------------------+------+----+----+
and:
select * from mydb t
inner join cte on t.year = cte.yr and t.month = cte.mn and t.day = cte.dy
Instead of a recursive common table expression a table of integers may be used instead (or use a set unioned select queries to generate a set of integers) - often known as a tally table. The method one chooses will depend of dbms type and version being used.
Again, depending on database, it may be more efficient to persist the result seen above as a temporary table and add an index to that.

Average of successive pairs of rows

I have a table like so:
id | value
---+------
1 | 10
2 | 5
3 | 11
4 | 8
5 | 9
6 | 7
The data in this table is really pairs of values, which I need to take the average of, which should result in:
pair_id | pair_avg
--------+---------
1 | 7.5
2 | 9.5
3 | 8
I have got some other information (a pair of flags) which could also help to pair them, though they still have to be in id order. I cannot really change how the data comes to me.
As I'm more used to arrays than SQL, all I can think is that I need to loop through the table and sum the pairs. But this doesn't strike me as very SQL-ish.
Update
In making this minimal example, I have apparently over simplified.
As the table I am working with is the result of several selects, the IDs will not be quite so clean, apologies for not specifying this.
The table looks a lot more like:
id | value
----------
1 | 10
4 | 5
6 | 11
7 | 8
10 | 9
15 | 7
The results will be used to create a second table, I don't care about the index on this new table, it can provide its own, therefore giving the result already indicated above.
If your data is as clean as the question makes it seem: no NULL values, no gaps, pairs have consecutive positive numbers, starting with 1, and assuming id is type integer, it can be as simple as:
SELECT (id+1)/2 AS pair_id, avg(value) AS pair_avg
FROM tbl
GROUP BY 1
ORDER BY 1;
Integer division truncates the result and thus takes care of grouping pairs automatically this way.
If your id numbers are not as regular but at least strictly monotonically increasing like your update suggests (still no NULL or missing values), you can use a surrogate ID generated with row_number() instead:
SELECT id/2 AS pair_id, avg(value) AS pair_avg
FROM (SELECT row_number() OVER (ORDER BY id) + 1 AS id, value FROM tbl) t
GROUP BY 1
ORDER BY 1;
db<>fiddle here
I think you can just use group by with arithmetic:
select row_number() over (order by min(id)), min(id), max(id), avg(id)
from t
group by floor( (id - 1) / 2 );
I'm not sure why you would want to renumber the ids after aggregation. The original ids seem more useful.
You may use ceil function by appliying division by 2 to id column as in the following select statement :
with t(id,value) as
(
select 1 , 10 union all
select 2 , 5 union all
select 3 , 11 union all
select 4 , 8 union all
select 5 , 9 union all
select 6 , 7
)
select ceil(id/2::numeric) as "ID", avg(t.value) as "pair_avg"
from t
group by "ID"
order by "ID";
id | pair_avg
-------------
1 | 7.5
2 | 9.5
3 | 8

Disassemble string, group, and reconstruct in Oracle SQL

So here is what a sample of my data look like:
ID | Amount
1111-1 | 5
1111-1 | -5
1111-2 | 5
1111-2 | -5
12R-1 | 8
12R-1 | -8
12R-3 | 8
12R-3 | -8
54A73-1| 2
54A73-1| -2
54A73-2| 2
54A73-2| -1
What I want to do is group by the string in the ID column before the dash, and find the group of IDs that have a sum of zero. The kicker is that after I find which group of IDs sum to zero, I want to add back the dash and number following the dash.
Here is what I hope the solution to look like:
ID | Amount
1111-1 | 5
1111-1 | -5
1111-2 | 5
1111-2 | -5
12R-1 | 8
12R-1 | -8
12R-3 | 8
12R-3 | -8
Notice how the IDs starting with 54A73 are not there anymore, its because the sum of their Amounts is not equal to zero.
Any help solving this questions would be much appreciated!
Here's one option joining the table back to itself after grouping by the beginning part of the id field using left and locate:
MySQL Version
select id, amount
from yourtable t
join (
select left(id, locate('-', id)-1) shortid
from yourtable
group by left(id, locate('-', id)-1)
having sum(amount) = 0
) t2 on left(t.id, locate('-', t.id)-1) = t2.shortid
SQL Fiddle Demo
Oracle Version
select id, amount
from yourtable t
join (
select substr(id, 0, instr(id,'-')-1) shortid
from yourtable
group by substr(id, 0, instr(id,'-')-1)
having sum(amount) = 0
) t2 on substr(t.id, 0, instr(t.id,'-')-1) = t2.shortid
More Fiddle

Joining series of dates and counting continous days

Let's say I have a table as below
date add_days
2015-01-01 5
2015-01-04 2
2015-01-11 7
2015-01-20 10
2015-01-30 1
what I want to do is to check the days_balance, i.e. if date is greater or smaller than previous date + N days (add_days) and take the cumulated sum of days count if they are a continuous series.
So the algorithm should work like
for i in 2:N_rows {
days_balance[i] := date[i-1] + add_days[i-1] - date[i]
if days_balance[i] >= 0 then
date[i] := date[i] + days_balance[i]
}
The expected result should be as follows
date days_balance
2015-01-01 0
2015-01-04 2
2015-01-11 -3
2015-01-20 -2
2015-01-30 0
Is it possible in pure SQL? I imagine it should be with some conditional joins, but cannot see how it could be implemented.
I'm posting another answer since it may be nice to compare them since they use different methods (this one just does a n^2 style join, other one used a recursive CTE). This one takes advantage of the fact that you don't have to calculate the days_balance for each previous row before calculating it for a particular row, you just need to sum things from previous days....
drop table junk
create table junk(date DATETIME, add_days int)
insert into junk values
('2015-01-01',5 ),
('2015-01-04',2 ),
('2015-01-11',7 ),
('2015-01-20',10 ),
('2015-01-30',1 )
;WITH cte as
(
select ROW_NUMBER() OVER (ORDER BY date) i, date, add_days, ISNULL(DATEDIFF(DAY, LAG(date) OVER (ORDER BY date), date), 0) days_since_prev
FROM Junk
)
, combinedWithAllPreviousDaysCte as
(
select i [curr_i], date [curr_date], add_days [curr_add_days], days_since_prev [curr_days_since_prev], 0 [prev_add_days], 0 [prev_days_since_prev] from cte where i = 1 --get first row explicitly since it has no preceding rows
UNION ALL
select curr.i [curr_i], curr.date [curr_date], curr.add_days [curr_add_days], curr.days_since_prev [curr_days_since_prev], prev.add_days [prev_add_days], prev.days_since_prev [prev_days_since_prev]
from cte curr
join cte prev on curr.i > prev.i --join to all previous days
)
select curr_i, curr_date, SUM(prev_add_days) - curr_days_since_prev - SUM(prev_days_since_prev) [days_balance]
from combinedWithAllPreviousDaysCte
group by curr_i, curr_date, curr_days_since_prev
order by curr_i
outputs:
+--------+-------------------------+--------------+
| curr_i | curr_date | days_balance |
+--------+-------------------------+--------------+
| 1 | 2015-01-01 00:00:00.000 | 0 |
| 2 | 2015-01-04 00:00:00.000 | 2 |
| 3 | 2015-01-11 00:00:00.000 | -3 |
| 4 | 2015-01-20 00:00:00.000 | -5 |
| 5 | 2015-01-30 00:00:00.000 | -5 |
+--------+-------------------------+--------------+
Well, I think I have it with a recursive CTE (sorry, I only have Microsoft SQL Server available to me at the moment, so it may not comply with PostgreSQL).
Also I think the expected results you had were off (see comment above). If not, this can probably be modified to conform to your math.
drop table junk
create table junk(date DATETIME, add_days int)
insert into junk values
('2015-01-01',5 ),
('2015-01-04',2 ),
('2015-01-11',7 ),
('2015-01-20',10 ),
('2015-01-30',1 )
;WITH cte as
(
select ROW_NUMBER() OVER (ORDER BY date) i, date, add_days, ISNULL(DATEDIFF(DAY, LAG(date) OVER (ORDER BY date), date), 0) days_since_prev
FROM Junk
)
,recursiveCte (i, date, add_days, days_since_prev, days_balance, math) as
(
select top 1
i,
date,
add_days,
days_since_prev,
0 [days_balance],
CAST('no math for initial one, just has zero balance' as varchar(max)) [math]
from cte where i = 1
UNION ALL --recursive step now
select
curr.i,
curr.date,
curr.add_days,
curr.days_since_prev,
prev.days_balance - curr.days_since_prev + prev.add_days [days_balance],
CAST(prev.days_balance as varchar(max)) + ' - ' + CAST(curr.days_since_prev as varchar(max)) + ' + ' + CAST(prev.add_days as varchar(max)) [math]
from cte curr
JOIN recursiveCte prev ON curr.i = prev.i + 1
)
select i, DATEPART(day,date) [day], add_days, days_since_prev, days_balance, math
from recursiveCTE
order by date
And the results are like so:
+---+-----+----------+-----------------+--------------+------------------------------------------------+
| i | day | add_days | days_since_prev | days_balance | math |
+---+-----+----------+-----------------+--------------+------------------------------------------------+
| 1 | 1 | 5 | 0 | 0 | no math for initial one, just has zero balance |
| 2 | 4 | 2 | 3 | 2 | 0 - 3 + 5 |
| 3 | 11 | 7 | 7 | -3 | 2 - 7 + 2 |
| 4 | 20 | 10 | 9 | -5 | -3 - 9 + 7 |
| 5 | 30 | 1 | 10 | -5 | -5 - 10 + 10 |
+---+-----+----------+-----------------+--------------+------------------------------------------------+
I don’t quite get how your algorithm returns your expected results? But let me share a technique I came up with that might help.
This will only work if the end result of your data is to be exported to Excel, and even then it won’t work in all scenarios depending on what format you export your dataset in, but here it is....
If you’ll familiar with Excel Formulas, what I discovered is that if you write an Excel formula in your SQL as another field, it will execute that formula for you as soon as you export to excel (best method that works for me is just coping and pasting it into Excel, so that it doesn’t format it as text)
So for your example, here’s what you could do (noting again I don’t understand your algorithm, so this is probably wrong, but it’s just to give you the concept)
SELECT
date
, add_days
, '=INDEX($1:$65536,ROW()-1,COLUMN()-2)'
||'+INDEX($1:$65536,ROW()-1,COLUMN()-1)'
||'-INDEX($1:$65536,ROW(),COLUMN()-2)'
AS "days_balance[i]"
,'=IF(INDEX($1:$65536,ROW(),COLUMN()-1)>=0'
||',INDEX($1:$65536,ROW(),COLUMN()-3)'
||'+INDEX($1:$65536,ROW(),COLUMN()-1))'
AS "date[i]"
FROM
myTable
ORDER BY /*Ensure to order by whatever you need for your formula to work*/
The key part to making this work is using the INDEX formula function to select a cell based on the position of the current cell. So ROW()-1 tells it get me the result of the previous record, and COLUMN()-2 means take the value from two columns to the left of the current. Because you can't use cell references like A2+B2-A3 because the row numbers won't change on export, and it assumes the position of the columns.
I used SQL string concatenation with || just so it's easier to read on screen.
I tried this one in excel; it didn’t match your expected results. But if this technique works for you then just correct the excel formula to suit.