SQL - Repeated sum of the values, based on dates selected - sql

Currently, I need a simple thing:
sale_date
Gross
SUM_GROSS
2018-01-01
1
6
2018-01-02
2
6
2018-01-03
3
6
I know this question already mentioned before, the difference now, is that I need to calculate a sum based on selected dates. (I use BigQuery)
SUM(SALES.GrossValueBaseCurrency) OVER(PARTITION BY ???) AS SUM_GROSS
If I will use
SUM(SALES.GrossValueBaseCurrency) OVER(PARTITION BY SALE.SALE_DATE) AS SUM_GROSS
It will give me what I would like ONLY if I will select specific ONE day.
How can I make it work, so if I will select different dates, SUM_GROSS will repeat the SUM of ALL gross values for a selected period of time?
SAMPLE DATA and Expectations:
Expecting 60 in SUM_GROSS column
Row SALE_DATE GROSS SUM_GROSS
1 25/08/2018 10.00 60
2 04/10/2018 10.00 60
3 04/07/2018 10.00 60
4 01/03/2018 10.00 60
5 10/02/2018 10.00 60
6 10/01/2018 10.00 60
If you will query this table result should be :
SELECT SUM(GROSS) AS GROSS, SUM_GROSS FROM TABLE
WHERE SALE_DATE BETWEEN 01/01/2018 AND 01/04/2018
GROUP BY SUM_GROSS
RESULT:
GROSS SUM_GROSS
30 30

I think you want conversation in partition clause:
SUM(SALES.GrossValueBaseCurrency) OVER (PARTITION BY EXTRACT(YEAR from SALE.SALE_DATE), EXTRACT(MONTH from SALE.SALE_DATE)) AS SUM_GROSS
EDIT :
SELECT . . .,
SUM(SALES.GrossValueBaseCurrency) OVER () AS SUM_GROSS
FROM SALES s
WHERE SALE.SALE_DATE BETWEEN "2018-01-01 AND "2018-02-01"

Is this what you are looking for?
SUM(CASE WHEN sales.sale_date = '2018-01-01'
THEN SALES.GrossValueBaseCurrency
ELSE 0
END) OVER () AS sales_20180101

Related

Calculate a 3-month moving average from non-aggregated data

I have a bunch of orders. Each order is either a type A or type B order. I want a 3-month moving average of time it takes to ship orders of each type. How can I aggregate this order data into what I want using Redshift or Postgres SQL?
Start with this:
order_id
order_type
ship_date
time_to_ship
1
a
2021-12-25
100
2
b
2021-12-31
110
3
a
2022-01-01
200
4
a
2022-01-01
50
5
b
2022-01-15
110
6
a
2022-02-02
100
7
a
2022-02-28
300
8
b
2022-04-05
75
9
b
2022-04-06
210
10
a
2022-04-15
150
Note: Some months have no shipments. The solution should allow for this.
I want this:
order_type
ship__month
mma3_time_to_ship
a
2022-02-01
150
a
2022-04-01
160
b
2022-04-01
126.25
Where a 3-month moving average is only calculated for months with at least 2 preceding months. Each record is an order type-month. The ship_month columns denotes the month of shipment (Redshift represents months as the date of the first of the month).
Here's how the mma3_time_to_ship column is calculated, expressed as Excel-like formulas:
150 = AVERAGE(100, 200, 50, 100, 300) <- The average for all A orders in Dec, Jan, and Feb.
160 = AVERAGE(200, 50, 100, 300, 150) <- The average for all A orders in Jan, Feb, Apr (no orders in March)
126.25 = AVERAGE(110, 110, 75, 210) <- The average for all B orders in Dec, Jan, Apr (no B orders in Feb, no orders at all in Mar)
My attempt doesn't aggregate it into monthly data and 3-month averages (this query runs without error in Redshift):
SELECT
order_type,
DATE_TRUNC('month', ship_date) AS ship_month,
AVG(time_to_ship) OVER (
PARTITION BY
order_type,
ship_month
ORDER BY ship_date
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
) AS avg_time_to_ship
FROM tbl
Is what I want possible?
This is honestly a complete stab in the dark, so it won't surprise me if it's not correct... but it seems to me you can accomplish this with a self join using a range of dates within the join.
select
t1.order_type, t1.ship_date, avg (t2.time_to_ship) as 3mma_time_to_ship
from
tbl t1
join tbl t2 on
t1.order_type = t2.order_type and
t2.ship_date between t1.ship_date - interval '3 months' and t1.ship_date
group by
t1.order_type, t1.ship_date
The results don't match your example, but then I'm not entirely sure where they came from anyway.
Perhaps this will be the catalyst towards an eventual solution or at least an idea to start.
This is Pg12, by the way. Not sure if it will work on Redshift.
-- EDIT --
Per your updates, I was able to match your three results identically. I used dense_rank to find the closest three months:
with foo as (
select
order_type, date_trunc ('month', ship_date)::date as ship_month,
time_to_ship, dense_rank() over (partition by order_type order by date_trunc ('month', ship_date)) as dr
from tbl
)
select
f1.order_type, f1.ship_month,
avg (f2.time_to_ship),
array_agg (f2.time_to_ship)
from
foo f1
join foo f2 on
f1.order_type = f2.order_type and
f2.dr between f1.dr - 2 and f1.dr
group by
f1.order_type, f1.ship_month
Results:
b 2022-01-01 110.0000000000000000 {110,110}
a 2022-01-01 116.6666666666666667 {100,50,200,100,50,200}
b 2022-04-01 126.2500000000000000 {110,110,75,210,110,110,75,210}
b 2021-12-01 110.0000000000000000 {110}
a 2021-12-01 100.0000000000000000 {100}
a 2022-02-01 150.0000000000000000 {100,50,200,100,300,100,50,200,100,300}
a 2022-04-01 160.0000000000000000 {50,200,100,300,150}
There are some dupes in the array elements, but it doesn't seem to impact the averages. I'm sure that part could be fixed.

How to calculate average monthly number of some action in some perdion in Teradata SQL?

I have table in Teradata SQL like below:
ID trans_date
------------------------
123 | 2021-01-01
887 | 2021-01-15
123 | 2021-02-10
45 | 2021-03-11
789 | 2021-10-01
45 | 2021-09-02
And I need to calculate average monthly number of transactions made by customers in a period between 2021-01-01 and 2021-09-01, so client with "ID" = 789 will not be calculated because he made transaction later.
In the first month (01) were 2 transactions
In the second month was 1 transaction
In the third month was 1 transaction
In the nineth month was 1 transactions
So the result should be (2+1+1+1) / 4 = 1.25, isn't is ?
How can I calculate it in Teradata SQL? Of course I showed you sample of my data.
SELECT ID, AVG(txns) FROM
(SELECT ID, TRUNC(trans_date,'MON') as mth, COUNT(*) as txns
FROM mytable
-- WHERE condition matches the question but likely want to
-- use end date 2021-09-30 or use mth instead of trans_date
WHERE trans_date BETWEEN date'2021-01-01' and date'2021-09-01'
GROUP BY id, mth) mth_txn
GROUP BY id;
Your logic translated to SQL:
--(2+1+1+1) / 4
SELECT id, COUNT(*) / COUNT(DISTINCT TRUNC(trans_date,'MON')) AS avg_tx
FROM mytable
WHERE trans_date BETWEEN date'2021-01-01' and date'2021-09-01'
GROUP BY id;
You should compare to Fred's answer to see which is more efficent on your data.

Computing rolling average and standard deviation by dates

I have the below table where I will need to compute the rolling average and standard deviation based on the dates. I have listed below the tables and expected results. I am trying to compute the rolling average for an id based on date. rollAvgA is computed based on metricA. For example, for the first occurrence of id for a particular date the result should return zero as it does not have any preceding values. Please let me know how this can be accomplished?
Current Table :
Date id metricA
8/1/2019 100 2
8/2/2019 100 3
8/3/2019 100 2
8/1/2019 101 2
8/2/2019 101 3
8/3/2019 101 2
8/4/2019 101 2
Expected Table :
Date id metricA rollAvgA
8/1/2019 100 2 0
8/2/2019 100 3 2.5
8/3/2019 100 2 2.3
8/1/2019 101 2 0
8/2/2019 101 3 2.5
8/3/2019 101 2 2.3
8/4/2019 101 2 2.25
You seem to want a cumulative average. This is basically:
select t.*,
avg(metricA * 1.0) over (partition by id order by date) as rollingavg
from t;
The only caveat is that the first value is an average of one value. To handle this, use a case expression:
select t.*,
(case when row_number() over (partition by id order by date) > 1
then avg(metricA * 1.0) over (partition by id order by date)
else 0
end) as rollingavg
from t;

select total sales per day

i have query like this and i wanted to have sum of sales per day
SELECT DATEPART(day,deduction_timestamp) as [day],
[fare_deduction]
FROM [dbfastsprocess].[dbo].[vClosingTransitLog]
WHERE bus_id in ('JEAST', 'MKV004', 'NWTN01')
and YEAR(deduction_timestamp) = ISNULL(2016, YEAR(deduction_timestamp))
and MONTH(deduction_timestamp) = ISNULL(10, MONTH(deduction_timestamp))
GROUP BY DATEPART(day,deduction_timestamp), fare_deduction
with result :
day fare_deduction
--------------------------------
1 10.00
3 15.00
3 2.00
4 10.00
10 20.00
31 12.00
and i wanted the result to be like this..
day fare_deduction
--------------------------------
1 10.00
3 17.00
4 10.00
10 20.00
31 12.00
and take note that not all day have values and it only display the affected
day only. Can help me on these? Thanks!
SELECT DATEPART(day,deduction_timestamp) as [day],
sum(fare_deduction) as fare_deduction
FROM [dbfastsprocess].[dbo].[vClosingTransitLog]
WHERE bus_id in ('JEAST', 'MKV004', 'NWTN01')
and YEAR(deduction_timestamp) = ISNULL(2016, YEAR(deduction_timestamp))
and MONTH(deduction_timestamp) = ISNULL(10, MONTH(deduction_timestamp))
GROUP BY DATEPART(day,deduction_timestamp)
Simply use sum, it will get your required result
SELECT DATEPART(day,deduction_timestamp) as [day],
SUM([fare_deduction]) [fare_deduction]
FROM [dbfastsprocess].[dbo].[vClosingTransitLog]
WHERE bus_id in ('JEAST', 'MKV004', 'NWTN01')
and YEAR(deduction_timestamp) = ISNULL(2016, YEAR(deduction_timestamp))
and MONTH(deduction_timestamp) = ISNULL(10, MONTH(deduction_timestamp))
GROUP BY DATEPART(day,deduction_timestamp)

How to sum total amount for every month in a year?

I have a database in SQL Server 2012 and there is a table with dates in D.M.YYYY format like below:
ID | Date(date type) | Amount(Numeric)
1 3.4.2013 16.00
1 12.4.2013 13.00
1 2.5.2013 9.50
1 18.5.2013 10.00
I need to sum the total amount for every month in a given year. For example:
ID | Month | TotalAmount
1 1 0.00
...
1 4 29.00
1 5 19.50
I thought what I needed was to determine the number of days in a month, so I created a function which is described in determine the number of days, and it worked. After that I tried to compare two dates(date type) and got stuck; there are some examples out there, but all of them about datetime.
Is this wrong? How can I accomplish this?
I think you just want an aggregation:
select id, month(date) as "month", sum(amount) as TotalAmount
from t
where year(date) = 2013
group by id, month(date)