Using Over & Partition in SQL - sql

I have a SQL table with monthly prices for products from different categories (e.g. fruits, vegetables, dairy).
I'd like to calculate a running monthly average for a specific category and for all the products in the same month, in the same query.
So combining:
Select date, avg(price) group by date where category = 'Fruit'
Select date, avg(price) group by date
Is that possible to do using OVER & Partition (or any other way for that matter)
Edit
I am using MS SQL
My data is monthly, so I don't need to extract month end dates -I can just group on date then I will get month end data
As an example, if my table looks like this:
|Date| Item | Category |Price |
|Jan |Banana | Fruit | 10|
|Jan |Potato | Veg | 20 |
Then the output would be
Date | Fruit Avg | Overall Avg |
Jan | 10 | 15
Apologies in advance for mangling the tables, but that's for another thread.
Thanks

why you need over?
Select date, category , avg(price) group by date, category

Try to use
Select extract(year from date)||extract(month from date) as month,
category, avg(price) as avg_price
from sales
group by month, category
P.S. using alias month may not be allowed for some database systems, in that case replace month with extract(year from date)||extract(month from date) in the group by list. Also concatenation operator may differ, replace pipes( || ) with a plus sign ( + ) whenever such a situation is met.

If you want one row per month and the data has multiple rows, then you need to aggregate:
select date_trunc('month', date) as month,
avg(price) as avg_per_month,
avg(avg(price)) over (order by date_trunc('month', date))
from t
where category = 'Fruit'
group by date_trunc('month', date)
order by date_trunc('month', date);
If you only have one row per month, then you can do:
select date, price,
avg(price) over (order by date)
from t
where category = 'Fruit'
order by date;

Related

Historic and avg data in sql

I want to get report data in below scenarios using the sample table provided below(data is huge in my db)
List item(same week, prior year) sales and (same day, same week, prior year) sales.
Rolling 6 month avg weekly selling
id
date_week
date_value
sales
Item1
2020/01-04
20200120
230
Item2
2020/06-03
20200608
23.0
Item3
2019/11-03
20191111
null
Item4
2020/07-04
20200720
123
Item5
2019/08-01
20190729
456
Item6
2019/09-03
20190909
1234
Item7
2020/06-02
20200601
4556
Item8
2020/09-01
20200824
23
Item9
2021/09-02
20210906
1223
in above table date_week is year/month_week ( so here i get the week number)
Am trying the below query to achieve
SELECT
DATEPART(week, date_value) AS Week,
id ,
sum(sales) AS sales
FROM table
WHERE date_value <= date_value
AND date_value < DATEADD(year, 1, date_value)
GROUP BY DATEPART(week, date_value), id
ORDER BY DATEPART(week, date_value);
Please suggest me how to achieve the scenarios am looking for.
You can do these with a join. First I would separate my date columns to Year | Month | Week | Day. If this is the only format you have available for your dates, you can use left(), right() functions. Always better to have datetime format tho.
After your query can look like:
SELECT t1.Year, t1.Week, t1.Id, t1.Sales, t2.Sales as Last_year_this_week_sales
from (
SELECT
cast(right(date_value,2) as int) AS Week,
cast(left(date_value,4) as int) as Year
id ,
sum(sales) AS sales
FROM table
GROUP BY right(date_value,2),
left(date_value,4), id ) t1
left join (
SELECT
cast(right(date_value,2) as int) AS Week,
cast(left(date_value,4) as int) as Year
id ,
sum(sales) AS sales
FROM table
GROUP BY right(date_value,2),
left(date_value,4), id ) t2 ON t1.week = t2.week and t2.year = t1.year-1 and t1.id = t2.id';
I am assuming you want to have results on id level. if not, you need to remove it from your grouping and join. And you can do the same for the daily results replacing week with day.
You can do with a subquery - if you need to take the average of weekly totals in the last 6 months. And again, if you need it on item level, keep the id in your select and group by statements.
If not, you can do:
SELECT avg(sales) as Sales_avg
FROM
(SELECT
cast(right(date_value,2) as int) AS Week,
cast(left(date_value,4) as int) as Year,
sum(sales) AS sales
FROM table
where date_value>='20210101' --replace with the date you need
GROUP BY right(date_value,2),
left(date_value,4) ) t1;

how to filter only by the ids from column a that have all the values ​in column b

Could I kindly ask for some guidance on how I can filter in Presto SQL only by the values (from column a) that have all values in column b?
So, I am looking to get all the product_ids by date that have all promotion days (from 1 - 9) in promotion_running_days column.
I tried to use 'promotion_running_days in (1,2,3,4,5,6,7) but it returns also the product_ids have only 2 or 3 promotion days.
Using this query approach:
SELECT
product_id
,date
,ROUND(MAX(DATE_DIFF('day', CAST(DATE_PARSE(promotion_start_date, '%Y-%m-%d %T') AS DATE), CAST(DATE_PARSE(date, '%Y-%m-%d') AS DATE))),0) AS promotion_running_days
,SUM(revenue) AS total_revenue
FROM product_db
WHERE
date between '2019-01-01' and '2019-01-07'
AND promotion_start_date>='2019-01-01'
Group by 1,2;
I would like my output to look like this:
Product Id |Date| |Promotion Running Days|
1 |2019-01-01| |1|
1 |2019-01-02| |2|
1 |2019-01-03| |3|
1 |2019-01-04| |4|
1 |2019-01-05| |5|
1 |2019-01-06| |6|
1 |2019-01-07| |7|
I am looking to get all the product_ids by date that have all promotion days
You seem to want aggregation. Assuming you have at most one row per date:
SELECT product_id, SUM(revenue) AS total_revenue
FROM product_db
WHERE date between '2019-01-01' and '2019-01-07' and
promotion_start_date>='2019-01-01'
GROUP BY product_id
HAVING COUNT(*) = 7; -- 7 == all days
However, your sample results suggest row_number():
select product_id, date,
row_number() over (partition by product_id order by date)
from product_db
order by product_id, date;

SQL monthly rolling sum

I am trying to calculate monthly balances of bank accounts from the following postgresql table, containing transactions:
# \d transactions
View "public.transactions"
Column | Type | Collation | Nullable | Default
--------+------------------+-----------+----------+---------
year | double precision | | |
month | double precision | | |
bank | text | | |
amount | numeric | | |
In "rolling sum" I mean that the sum should contain the sum of all transactions until the end of the given month from the beginning of time, not just all transactions in thegiven month.
I came up with the following query:
select
a.year, a.month, a.bank,
(select sum(b.amount) from transactions b
where b.year < a.year
or (b.year = a.year and b.month <= a.month))
from
transactions a
order by
bank, year, month;
The problem is that this contains as many rows for each of the months for each banks as many transactions were there. If more, then more, if none, then none.
I would like a query which contains exactly one row for each bank and month for the whole time interval including the first and last transaction.
How to do that?
An example dataset and a query can be found at https://rextester.com/WJP53830 , courtesy of #a_horse_with_no_name
You need to generate a list of months first, then you can outer join your transactions table to that list.
with all_years as (
select y.year, m.month, b.bank
from generate_series(2010, 2019) as y(year) --<< adjust here for your desired range of years
cross join generate_series(1,12) as m(month)
cross join (select distinct bank from transactions) as b(bank)
)
select ay.*, sum(amount) over (partition by ay.bank order by ay.year, ay.month)
from all_years ay
left join transactions t on (ay.year, ay.month, ay.bank) = (t.year::int, t.month::int, t.bank)
order by bank, year, month;
The cross join with all banks is necessary so that the all_years CTE will also contain a bank for each month row.
Online example: https://rextester.com/ZZBVM16426
Here is my suggestion in Oracle 10 SQL:
select a.year,a.month,a.bank, (select sum(b.amount) from
(select a.year as year,a.month as month,a.bank as bank,
sum(a.amount) as amount from transactions c
group by a.year,a.month,a.bank
) b
where b.year<a.year or (b.year=a.year and b.month<=a.month))
from transactions a order by bank, year, month;
Consider aggregating all transactions first by bank and month, then run a window SUM() OVER() for rolling monthly sum since earliest amount.
WITH agg AS (
SELECT t.year, t.month, t.bank, SUM(t.amount) AS Sum_Amount
FROM transactions t
GROUP BY t.year, t.month, t.bank
)
SELECT agg.year, agg.month, agg.bank,
SUM(agg.Sum_Amount) OVER (PARTITION BY agg.bank ORDER BY agg.year, agg.month) AS rolling_sum
FROM agg
ORDER BY agg.year, agg.month, agg.bank
Should you want YTD rolling sums, adjust the OVER() clause by adding year to partition:
SUM(agg.Sum_Amount) OVER (PARTITION BY agg.bank, agg.year ORDER BY agg.month)

Joining date partitioned table give "'the_date' is not present in the GROUP BY list" error

With this answer in mind I'm trying to query ga_sessions, summing some basic metrics after joining to my own custom reporting schedule. The custom reporting schedule maps a custom period (about 4 weeks) onto a date format YYYYMMDD and is in its own table.
Here's what I've come up with:
SELECT
schedule.period,
gadata.Visits,
gadata.Pageviews,
gadata.Transactions,
gadata.Revenue
FROM (
SELECT
gadata.date AS the_date,
SUM(totals.visits) AS Visits,
SUM(totals.pageviews) AS Pageviews,
SUM(totals.transactions) AS Transactions,
SUM(totals.transactionRevenue)/1000000 AS Revenue
FROM TABLE_DATE_RANGE([project.table_prefix_],TIMESTAMP('2013-09-10'),TIMESTAMP('2013-09-12'))
GROUP BY
gadata.the_date
ORDER BY
gadata.the_date ASC
) AS gadata
JOIN
[project.reporting_schedule] AS schedule
ON
gadata.date = schedule.GA_Date
GROUP BY gadata.the_date
But this gives the error: "Error: Expression 'the_date' is not present in the GROUP BY list"
I strongly suspect there's something wrong with my use of the syntax, I'm quite fresh with Google Big Query and the combination of querying a date partitioned table and join is throwing me.
What do I need to change to correct the code and sum the metrics by custom period?
You ran into multiple issues. First is that with Table date range functions you cannot use alias, so you need to wrap into a select to use the alias further.
I replaced the scheduled table with a static writing, but you can replace with your own select field from table syntax
SELECT
the_date,
SUM(totals.visits) AS Visits,
SUM(totals.pageviews) AS Pageviews,
SUM(totals.transactions) AS Transactions,
SUM(totals.transactionRevenue)/1000000 AS Revenue
FROM (
SELECT
[date] AS the_date,
totals.visits,
totals.pageviews,
totals.transactions,
totals.transactionRevenue
FROM
TABLE_DATE_RANGE([google.com:analytics-bigquery:LondonCycleHelmet.ga_sessions_],TIMESTAMP('2013-09-10'),TIMESTAMP('2013-09-12')) ) tt
JOIN (
SELECT * FROM (SELECT '20130910' AS report_date), (SELECT '20130911' AS report_date)) schedule
ON
schedule.report_date = tt.the_date
GROUP BY
1
returns:
+-----+----------+--------+-----------+--------------+---------+---+
| Row | the_date | Visits | Pageviews | Transactions | Revenue | |
+-----+----------+--------+-----------+--------------+---------+---+
| 1 | 20130910 | 63 | 249 | 16 | 206.23 | |
+-----+----------+--------+-----------+--------------+---------+---+

Get name of person having activity in every month - Oracle SQL

I have log table where there is are records with user id and the date for a specific activity done. I want to get names of users having activity every month. I am using the following query
select distinct(employeeid) from transactions
where eventdate between '01-OCT-13' AND '23-OCT-13'
and eventdate between '01-SEP-13' AND '01-OCT-13'
and eventdate between '01-AUG-13' AND '01-SEP-13'
and eventdate between '01-JUL-13' AND '01-AUG-13';
But this is doesn't work. Can someone please suggest any improvement?
Edit:
Since my questions seems to be a little confusing, here is an example
EmployeeID | Timestamp
a | 01-Jul-13
b | 01-Jul-13
a | 01-Aug-13
c | 01-Aug-13
a | 01-Sep-13
d | 01-Sep-13
a | 01-Oct-13
a | 01-Oct-13
In the above table, we can see that employee "a" has activity in all the months from July till October. So I want to find a list of all such employees.
You can use COUNT as analytical function and get the number of months for each employee and total number of months. Then select only those employees where both counts match.
select distinct employeeid
from (
select employeeid,
count(distinct trunc(eventdate,'month')) --count of months for each employee
over (partition by employeeid) as emp_months,
count(distinct trunc(eventdate,'month')) --count of all months
over () as all_months,
from transactions
)
where emp_months = all_months;
Wish I could give you the code, but i'm in a bit of a hurry, so this is more of a suggestion.
Have you tried extracting the distinct months (from eventdate), for every user, and if that has 10 rows (assuming it is October, you could dynamically change this), then the employee must of had an event every month.
By very inefficient, I think you mean it doesn't work. The same value can't be both in september, in october, etc.
Anyway, using #LaggKing suggestion, you could try this query:
SELECT employeeid
FROM (
SELECT DISTINCT employeeid, MONTH(eventdate)
FROM transactions
)
HAVING COUNT(*) = MONTH(NOW())
EDIT: You need to take year into account.
SELECT employeeid
FROM (
SELECT DISTINCT employeeid, MONTH(eventdate)
FROM transactions
WHERE YEAR(eventdate) = YEAR(NOW())
)
HAVING COUNT(*) = MONTH(NOW())