Aggregate data based on fixed moving date window in Presto - sql

I wanted to:
aggregate numbers in a “3-months" rolling window, (eg Jan-Mar, Feb-Apr, Mar-May....)
then compare the same country & city with last year's same rolling window
Table I already have: (unique at: country + city + month level)
country city month sum
US A 2019-03-01 3
US B 2019-03-01 4
DE C 2019-03-01 5
US A 2019-03-01 3
CN B 2019-03-01 4
US B 2019-04-01 4
UK C 2019-04-01 7
US C 2019-04-01 2
....
US A 2019-12-01 10
US B 2020-12-01 6
US C 2021-01-01 7
Step 1 ideal output:
country city period sum
US A 2019-03-01~2019-05-01 XXX
US A 2019-04-01~2019-06-01 YYY
UK A 2019-03-01~2019-05-01 ZZZ
...
UK A 2020-12-01~2021-02-01 BBB
Step 2 ideal output:
country city period sum last_year_sum year_over_year_%
US A 2019-03-01~2019-05-01 XXX 111 40%
US A 2019-04-01~2019-06-01 YYY 1111 30%
UK A 2019-03-01~2019-05-01 ZZZ 11111 20%
...
UK A 2020-12-01~2021-02-01 BBB 1111 15%
Ideally, I wanted to achieve this in Presto - any idea how to do that? Thanks!!

Unfortunately, Presto doesn't support the range window frame specification using dates. One method uses joins and aggregation and then lag() to get the last year amount:
select t.country, t.city, t.sum,
sum(t2.sum) as this_year_sum,
lag(sum(t2.sum), 12) over (partition by country, city order by month) as prev_year_sum,
(-1 +
sum(t2.sum) /
lag(sum(t2.sum), 12) over (partition by country, city order by month)
) as yoy_increase
from t left join
t t2
on t2.country = t.country and
t2.city = t.city and
t2.month >= t.month and
t2.month <= t.month + interval '2' month
group by t.country, t.city, t.sum;
Note: This assumes that you have data for all months for each country/city combination.

Related

How to get the last day of the month without LAST_DAY() or EOMONTH()?

I have a table t with:
DATE
LOCATION
PRODUCT_ID
AMOUNT
2021-10-29
1
123
10
2021-10-30
1
123
9
2021-10-31
1
123
8
2021-10-29
1
456
100
2021-10-30
1
456
90
2021-10-31
1
456
80
2021-10-29
2
123
18
2021-10-30
2
123
17
2021-11-29
2
456
18
I need to find the AMOUNT of each PRODUCT_ID for each combination of LOCATION + PRODUCT_ID.
If a PRODUCT_ID has no entry for that day the AMOUNT is NULL.
So the result should look like:
DATE
LOCATION
PRODUCT_ID
AMOUNT
2021-10-31
1
123
8
2021-10-31
1
456
80
2021-10-31
2
123
NULL
2021-11-30
2
456
NULL
Sadly EXASOL has no LAST_DAY() or EOMONTH() function. How can I solve this?
You can get to the last day of the month using a date_trunc function in combination with date_add:
case
when t.date = date_add('day', -1, date_add('month', 1, date_trunc('month', t.date)))
then 'Y' else 'N' end as end_of_month
That being said, if you group your table for all combinations of locations and products, you will not get NULLs for products without sales on the last day of the month as shown in your output table.
When you group your data, any value that does not exist will simply not show up in your output table. If you want to force nulls to show up, you can create a new table that contains all combinations of products, locations, and hard-coded end of month dates.
Then, you can left join your old table with this new hard-coded table by date, location, and product. This method will give you the NULL values you expect.

How to calculate average monthly number of some action in some perdion in Teradata SQL?

I have table in Teradata SQL like below:
ID trans_date
------------------------
123 | 2021-01-01
887 | 2021-01-15
123 | 2021-02-10
45 | 2021-03-11
789 | 2021-10-01
45 | 2021-09-02
And I need to calculate average monthly number of transactions made by customers in a period between 2021-01-01 and 2021-09-01, so client with "ID" = 789 will not be calculated because he made transaction later.
In the first month (01) were 2 transactions
In the second month was 1 transaction
In the third month was 1 transaction
In the nineth month was 1 transactions
So the result should be (2+1+1+1) / 4 = 1.25, isn't is ?
How can I calculate it in Teradata SQL? Of course I showed you sample of my data.
SELECT ID, AVG(txns) FROM
(SELECT ID, TRUNC(trans_date,'MON') as mth, COUNT(*) as txns
FROM mytable
-- WHERE condition matches the question but likely want to
-- use end date 2021-09-30 or use mth instead of trans_date
WHERE trans_date BETWEEN date'2021-01-01' and date'2021-09-01'
GROUP BY id, mth) mth_txn
GROUP BY id;
Your logic translated to SQL:
--(2+1+1+1) / 4
SELECT id, COUNT(*) / COUNT(DISTINCT TRUNC(trans_date,'MON')) AS avg_tx
FROM mytable
WHERE trans_date BETWEEN date'2021-01-01' and date'2021-09-01'
GROUP BY id;
You should compare to Fred's answer to see which is more efficent on your data.

group by of one column and having count of another

I have a table 'customer' which contains 4 columns
name day product price
A 2021-04-01 p1 100
B 2021-04-01 p1 100
C 2021-04-01 p2 120
A 2021-04-01 p2 120
A 2021-04-02 p1 100
B 2021-04-02 p3 80
C 2021-04-03 p2 120
D 2021-04-03 p2 120
C 2021-04-04 p1 100
With a command
SELECT COUNT(name)
FROM (SELECT name
FROM customer
WHERE day > '2021-03-28'
AND day < '2021-04-09'
GROUP BY name
HAVING COUNT(name) > 2)
I could count number of customer that bought something more than twice in a period of time.
I would like to know in each day (GROUP BY over day) how many customers bought something with this condition that in a period they bought something more than twice.
Suggested Edit:
For above example A and C are valid agents by the condition.
The desired output will be:
day how_many
2021-04-01 2
2021-04-02 1
2021-04-03 1
2021-04-04 1
I interpret your question as wanting to know how many customers made more than one purchase on each day. If so, one method uses two levels of aggregation:
select day,
sum(case when day_count >= 2 then 1 else 0 end)
from (select c.name, c.day, count(*) as day_count
from customer c
group by c.name, c.day
) nc
group by day
order by day;

Max and Avg debt days over a period of time

I have invoices pending payment, every invoice has two dates, first when the invoice is required to pay and the other when the invoice is paid. I want to know in a period of time the max debt and the avg debt
This is the table
Id Invoice Amount InvoiceDate InvoicePayment
----------- ------- ----------- ----------- -------------
1 Bill 1 314 2019-01-20 2019-03-01
2 Bill 2 205 2019-01-14 2019-02-18
3 Bill 3 90 2019-02-04 2019-02-06
4 Bill 4 456 2019-01-03 2019-04-27
I would like to know the max debt amount in february and the avg debt
You can unpivot with cross apply, and use a window sum to compute the "running" debt at each given point in time. The rest is just filtering and aggregation:
select avg(debt) avg_debt, max(debt) max_debt
from (
select x.dt, sum(x.amount) over(order by x.dt) debt
from mytable t
cross apply (values (invoicedate, amount), (invoicepayment, -amount)) as x(dt, amount)
) t
where dt >= '20200201' and dt < '20200301'

SQL: Group By cause with condition

I have an Access table looking like this:
ID Country Application Date
--------------------------------
12 France 12/01/2016
12 Germany 01/01/2017
13 Germany 01/02/2017
14 Spain 23/01/2017
14 Germany 01/02/2017
15
16 Greece 01/01/2017
I would like to get a single occurence of each ID with the most recent application date.
I tried this:
SELECT ID, Country, Max(Application Date)
FROM MyTable
GROUP BY ID
But Access refused this query and wanted me to add the country in the group by clause, which can't work then.
Moreover, I would like to be able to fetch the rows with no country and application date as well (like the row with ID=15).
The expected result would be:
ID Country Application Date
--------------------------------
12 Germany 01/01/2017
13 Germany 01/02/2017
14 Germany 01/02/2017
15
16 Greece 01/01/2017
I think this is what you may want
select t1.* from MyTable as t1 inner join
(
SELECT ID, Max(Application Date) as Application Date
FROM MyTable
GROUP BY ID
) as t2 on t1.Id=t2.ID and t1.Application Date=t2.Application Date
It works on your input data and returns correct output data. Try it, please )
SELECT d.id, MyTable.country, d.datemax
FROM (SELECT ID as id, Max(AppDate) as datemax
FROM MyTable
GROUP BY ID) as d
,MyTable
WHERE d.id = MyTable.id and datemax = MyTable.appdate
OR (datemax is null and country is null)