sum last n days quantity using sql window function

sum last n days quantity using sql window function - sql

I am trying to create following logic in Alteryx and data is coming from Exasol database.
Column “Sum_Qty_28_days“ should sum up the values of “Qty ” column for same article which falls under last 28 days.
My sample data looks like:
and I want following output:
E.g. “Sum_Qty_28_days” value for “article” = ‘A’ and date = ‘’2019-10-8” is 8 because it is summing up the “Qty” values associated with dates (coming within previous 28 days) Which are:
2019-09-15
2019-10-05
2019-10-08
for “article” = ‘A’.
Is this possible using SQL window function?
I tried myself with following code:
SUM("Qty") OVER (PARTITION BY "article", date_trunc('month',"Date")
ORDER BY "Date")
But, it is far from what I need. It is summing up the Qty for dates falling in same month. However, I need to sum of Qty for last 28 days.
Thanks in advance.

Yes, this is possible using standard SQL and in many databases. However, this will not work in all databases:
select t.*,
sum(qty) over (partition by article
order by date
range between interval '27 day' preceding and current row
) as sum_qty_28_days
from t;

If your RDBMS does not support the range frame, an alternative solution is to use an inline subquery:
select
t.*,
(
select sum(t1.qty)
from mytable t1
where
t1.article = t.article
and t1.date between t.date - interval 28 days and t.date
) sum_qty_28_days
from mytable t

Related

Where clause in a calculation

Say I have this table:
month
num_of_fruits
harvested
2022-01-01
133
3
2022-02-01
145
12
2022-03-01
123
5
2022-04-01
111
4
2022-05-01
164
9
..
..
..
I want to be able to set a new column called lost based on the month and num_of_fruits columns. To set this lost column, requires a calculation. The calculation is harvested - (num_of_fruits - num_of_fruits(last_month))
I'm having trouble in the parenthesis part - getting the last month's num_of_fruits. I have this to start:
select
id,
"month",
num_of_fruits,
harvested,
harvested - (num_of_fruits - num_of_fruits WHERE date_trunc('month', "month" - interval '1' month)) as lost,
selecting other columns..
It's giving me an error in the where clause.
Can you have a where clause inside a select statement? How would I take the last month's num_of_fruits and subtract it with this month's num_of_fruits - all while inside the select statement?
Any help or advice will greatly help me! Thank you so much in advance!

If you want to check other rows in the table, you will likely want either a subquery in your SELECT or to join the table to itself.
I think you are probably trying to do:
SELECT
harvested - (num_of_fruits - (SELECT num_of_fruits FROM mytable t2 WHERE t2.month = date_trunc('month', t1."month" - interval '1' month))) as lost
FROM mytable t1
Note that I created a whole new subquery (SELECT/FROM/WHERE) within your existing SELECT statement, instead of just adding a stray WHERE clause.
I also changed your condition so that it actually has a compares the result of DATETRUNC with something.
It's not clear to me that you actually need the DATETRUNC here (and, if you do, you might want it on both sides of the comparison), but you can use the basic idea above and fix the condition to match your needs.
An alternative (joining to self) to consider might be:
SELECT
t1.harvested - (t1.num_of_fruits - t2.num_of_fruits)
FROM mytable t1 LEFT OUTER JOIN mytable t2
ON t2.month = date_trunc('month', t1."month" - interval '1' month)))
If you know that you always have one row per month, so the previous row (ordered by month) is also the previous month, you could just use LAG:
SELECT
harvested - (num_of_fruits - LAG(num_of_fruits, 1) OVER (ORDER BY month)
FROM mytable
LAG(num_of_fruits, 1) OVER (ORDER BY month) means "the num_of_fruits from the previous row in the table when the table is ordered by month".

SQL: Apply an aggregate result per day using window functions

Consider a time-series table that contains three fields time of type timestamptz, balance of type numeric, and is_spent_column of type text.
The following query generates a valid result for the last day of the given interval.
SELECT
MAX(DATE_TRUNC('DAY', (time))) as last_day,
SUM(balance) FILTER ( WHERE is_spent_column is NULL ) AS value_at_last_day
FROM tbl
2010-07-12 18681.800775017498741407984000
However, I am in need of an equivalent query based on window functions to report the total value of the column named balance for all the days up to and including the given date .
Here is what I've tried so far, but without any valid result:
SELECT
DATE_TRUNC('DAY', (time)) AS daily,
SUM(sum(balance) FILTER ( WHERE is_spent_column is NULL ) ) OVER ( ORDER BY DATE_TRUNC('DAY', (time)) ) AS total_value_per_day
FROM tbl
group by 1
order by 1 desc
2010-07-12 16050.496339044977568391974000
2010-07-11 13103.159119670350269890284000
2010-07-10 12594.525752964512456914454000
2010-07-09 12380.159588711091681327014000
2010-07-08 12178.119542536668113577014000
2010-07-07 11995.943973804127033140014000
EDIT:
Here is a sample dataset:
LINK REMOVED
The running total can be computed by applying the first query above on the entire dataset up to and including the desired day. For example, for day 2009-01-31, the result is 97.13522530000000000000, or for day 2009-01-15 when we filter time as time < '2009-01-16 00:00:00' it returns 24.446144000000000000.
What I need is an alternative query that computes the running total for each day in a single query.
EDIT 2:
Thank you all so very much for your participation and support.
The reason for differences in result sets of the queries was on the preceding ETL pipelines. Sorry for my ignorance!
Below I've provided a sample schema to test the queries.
https://www.db-fiddle.com/f/veUiRauLs23s3WUfXQu3WE/2
Now both queries given above and the query given in the answer below return the same result.

Consider calculating running total via window function after aggregating data to day level. And since you aggregate with a single condition, FILTER condition can be converted to basic WHERE:
SELECT daily,
SUM(total_balance) OVER (ORDER BY daily) AS total_value_per_day
FROM (
SELECT
DATE_TRUNC('DAY', (time)) AS daily,
SUM(balance) AS total_balance
FROM tbl
WHERE is_spent_column IS NULL
GROUP BY 1
) AS daily_agg
ORDER BY daily

SQL Server date range, monthly or quarterly

I have a list of dates in a SQL Server table, and need to figure out a few separate themes about them:
Firstly, are the dates monthly or quarterly? The dates always start on the first of the month.
E.g. one sequence may be 01/01/13, 01/02/13, 01/03/13, 01/04/13, 01/05/13 therefore monthly (UK)
E.g. another sequence may be 01/12/12, 01/03/13, 01/06/13, 01/09/13, 01/12/13 therefore quarterly (UK)
And secondly, which may be solved by the first, are all the dates present? eg no gaps. One way I went around solving this was to say it is either monthly / quarterly or no idea, but that was in C#.
Thanks

You can use the DATEDIFF() function to compare two dates, and you can use a self-join and the ROW_NUMBER() function to compare dates from different rows:
;WITH cte AS (SELECT *, ROW_NUMBER() OVER (ORDER BY dt) RN
FROM Table1)
SELECT DATEDIFF(day,a.dt,b.dt)
FROM cte a
JOIN cte b
ON a.RN = b.RN-1
If you are using SQL 2012 you can use the LEAD() function to compare values from different rows:
SELECT DATEDIFF(day,dt,LEAD(dt,1) OVER(ORDER BY dt)) AS Days
,DATEDIFF(quarter,dt,LEAD(dt,1) OVER(ORDER BY dt)) AS Quarters
FROM Table2
Demo: SQL Fiddle

SQL Average Inter-arrival Time, Time Between Dates

I have a table with sequential timestamps:
2011-03-17 10:31:19
2011-03-17 10:45:49
2011-03-17 10:47:49
...
I need to find the average time difference between each of these(there could be dozens) in seconds or whatever is easiest, I can work with it from there. So for example the above inter-arrival time for only the first two times would be 870 (14m 30s). For all three times it would be: (870 + 120)/2 = 445 (7m 25s).
A note, I am using postgreSQL 8.1.22 .
EDIT: The table I mention above is from a different query that is literally just a one-column list of timestamps

Not sure I understood your question completely, but this might be what you are looking for:
SELECT avg(difference)
FROM (
SELECT timestamp_col - lag(timestamp_col) over (order by timestamp_col) as difference
FROM your_table
) t
The inner query calculates the distance between each row and the preceding row. The result is an interval for each row in the table.
The outer query simply does an average over all differences.

i think u want to find avg(timestamptz).
my solution is avg(current - min value). but since result is interval, so add it to min value again.
SELECT avg(target_col - (select min(target_col) from your_table))
+ (select min(target_col) from your_table)
FROM your_table

If you cannot upgrade to a version of PG that supports window functions, you
may compute your table's sequential steps "the slow way."
Assuming your table is "tbl" and your timestamp column is "ts":
SELECT AVG(t1 - t0)
FROM (
-- All this silliness would be moot if we could use
-- `` lead(ts) over (order by ts) ''
SELECT tbl.ts AS t0,
next.ts AS t1
FROM tbl
CROSS JOIN
tbl next
WHERE next.ts = (
SELECT MIN(ts)
FROM tbl subquery
WHERE subquery.ts > tbl.ts
)
) derived;
But don't do that. Its performance will be terrible. Please do what
a_horse_with_no_name suggests, and use window functions.

How best store year, month, and day in a MySQL database?

How best store year, month, and day in a MySQL database so that it would be easily retrieved by year, by year-month, by year-month-day combinations?

Let's say you have a table tbl with a column d of type DATE.
All records in 1997:
SELECT * FROM tbl WHERE YEAR(d) = 1997
SELECT * FROM tbl WHERE d BETWEEN '1997-01-01' AND '1997-12-31'
All records in March of 1997:
SELECT * FROM tbl WHERE YEAR(d) = 1997 AND MONTH(d) = 3
SELECT * FROM tbl WHERE d BETWEEN '1997-03-01' AND '1997-03-31'
All records on March 10, 1997:
SELECT * FROM tbl WHERE d = '1997-03-10'

Unless a time will ever be involved, use the DATE data type. You can use functions from there to select portions of the date.

I'd recommend the obvious: use a DATE.
It stores year-month-day with no time (hour-minutes-seconds-etc) component.

Store as date and use built in functions:day(), month() or year() to return the combination you wish.

What's wrong with DATE? As long as you need Y, Y-M, or Y-M-D searches, they should be indexable. The problem with DATE would be if you want all December records across several years, for instance.

This may be related to the problem that archivists have with common date datatypes. Often, you want to be able to encode just the year, or just the year and the month, depending on what information is available, but you want to be able to encode this information in just one datatype. This is a problem which doesn't apply in very many other situations. (In answer to this question in the past, I've had techie types dismiss it as a problem with the data: your data is faulty!)
e.g., in a composer catalogue you are recording the fact that the composer dated a manuscript "January 1951". What can you put in a MySQL DATE field to represent this? "1951-01"? "1951-01-00"? Neither is really valid. Normally you end up encoding years, months and days in separate fields and then having to implement the semantics at application level. This is far from ideal.

If you're doing analytics against a fixed range of dates consider using a date dimension (fancy name for table) and use a foreign key into the date dimension. Check out this link:
http://www.ipcdesigns.com/dim_date/
If you use this date dimension consider how easily it will be to construct queries against any kind of dates you can think of.
SELECT * FROM my_table
JOIN DATE_DIM date on date.PK = my_table.date_FK
WHERE date.day = 30 AND
date.month = 1 AND
date.year = 2010
Or
SELECT * FROM my_table
JOIN DATE_DIM date on date.PK = my_table.date_FK
WHERE date.day_of_week = 1 AND
date.month = 1 AND
date.year = 2010
Or
SELECT *, date.day_of_week_name FROM my_table
JOIN DATE_DIM date on date.PK = my_table.date_FK
WHERE date.is_US_civil_holiday = 1

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas