SQL - Average monthly amount - sql

I have a table with "amount" and "date" columns, and I want to display the average by month.
The table looks like this:
amount | date |
100 | 2017-04-22 20:39:24 |
300 | 2017-04-25 16:14:08 |
200 | 2017-04-28 17:51:16 |
100 | 2017-05-29 05:46:42 |
100 | 2017-05-08 16:15:13 |
100 | 2017-05-09 22:06:45 |
400 | 2017-06-10 10:57:34 |
500 | 2017-06-11 15:57:14 |
900 | 2017-06-14 16:02:36 |
This is what I have:
SELECT AVG(amount) AS avg_amount, date
FROM table
GROUP BY date
It displays the average by day so it ends up looking exactly the same as the first table but without the hour/minute/second portion, while I want it to look like this:
avg_amount | date |
200 | April |
100 | May |
600 | June |

GROUP BY MONTH(date)
Check out the date and time functions in MySQL or in PostgreSQL extract function.

I like TOvidiu's answer, but that will only work if you have a single years worth of data. I would suggest
SELECT AVG(amount) AS avg_amount, date
FROM table
GROUP BY YEAR(date), MONTH(date)

Related

Dealing with required duplicates in table records

Here's the situation. My team forecasts sales and revenue numbers at a monthly resolution but would like all reporting to be at a daily resolution. So what I am doing is ingesting these numbers and dividing the monthly targets by number of days and saving it in a table.
So I start of with something like this:
| date | forecasted_units | forecasted_revenue |
|---------|------------------|--------------------|
| 2020-01 | 372 | 9300 |
| 2020-02 | 435 | 9280 |
...
My target table now looks like this:
| date | forecasted_units | forecasted_revenue |
|------------|------------------|--------------------|
| 2020-01-01 | 12 | 300 |
| 2020-01-02 | 12 | 300 |
| 2020-01-03 | 12 | 300 |
...
| date | forecasted_units | forecasted_revenue |
|------------|------------------|--------------------|
| 2020-02-01 | 15 | 320 |
| 2020-02-02 | 15 | 320 |
| 2020-02-03 | 15 | 320 |
...
Now my table is quite a lot wider than the one above and all of them have duplicate records. As you can see there's a lot of data redundancy. Now my question is, Is there a more efficient method to save the same resolution of data in one table.
My immediate thought is to reshape the table to include a start date and end date to look like this:
| start_date | end_date | forecasted_units | forecasted_revenue |
|------------|------------|------------------|--------------------|
| 2020-01-01 | 2020-01-31 | 12 | 300 |
| 2020-02-01 | 2020-02-29 | 15 | 320 |
But that would offload all the computation to the instance generating all the reports because it would have to generate the data for each day in between the start and end date.
Is there a better way to do this?
Unfortunately, Redshift does not support handy Postgres function generate_series(), which would have largely simplified the task here.
Typical alternative solutions would involve a calendar table - basically, a table that enumerates all possible dates. If you have a table with a sufficient number of rows, you can generate such dataset on the fly with row_number() and dateadd():
select dateadd(day, row_number() over(order by 1) - 1, '2020-01-01') dt
from my_large_table;
You can store the results in another table (using the create table ... as select ... syntax), or use the query result directly. In both cases, you would then join it with your actual table. To count the number of days in the month, we use a window count:
select
d.dt,
t.forecasted_unit / count(*) over(partition by t.date) forecasted_units,
t.forecasted_revenue / count(*) over(partition by t.date) forecasted_revenue
from (
select dateadd(day, row_number() over(order by 1) - 1, '2020-01-01') dt
from my_large_table
) d
inner join mytable t on t.date = date_trunc('month', d.dt)

Slicing account balance data in BigQuery to generate a debit report

I have a collection of account balances over time:
+-----------------+------------+-------------+-----------------------+
| account_balance | department | customer_id | timestamp |
+-----------------+------------+-------------+-----------------------+
| 5 | A | 1 | 2019-02-12T00:00:00 |
| -10 | A | 1 | 2019-02-13T00:00:00 |
| -35 | A | 1 | 2019-02-14T00:00:00 |
| 20 | A | 1 | 2019-02-15T00:00:00 |
+-----------------+------------+-------------+-----------------------+
Each record shows the total account balance of a customer at a specified timestamp. The account balance increases e.g. to 20 from -35, when a customer tops-up his account with 55. As a customer uses a services, his account balances decreases e.g. from 5 to -10.
I want to aggregate this data in two ways:
1) Get the debit, credit and balance (credit-debit) of a department per month and year. The results from April should be a summary of all previous months:
+---------+--------+-------+------------+-------+--------+
| balance | credit | debit | department | month | year |
+---------+--------+-------+------------+-------+--------+
| 5 | 10 | -5 | A | 1 | 2019 |
| 20 | 32 | -12 | A | 2 | 2019 |
| 35 | 52 | -17 | A | 3 | 2019 |
| 51 | 70 | -19 | A | 4 | 2019 |
+---------+--------+-------+------------+-------+--------+
A customer's account balance might not change every month. There might be account balance records of customer 1 in February, but not March.
Notes towards the solution:
use EXTRACT(MONTH from timestamp) month
use EXTRACT(YEAR from timestamp) year
GROUP BY month, year, department
2) Get the change of debit, credit and balance of a department by date.
+---------+--------+-------+------------+-------------+
| balance | credit | debit | department | date |
+---------+--------+-------+------------+-------------+
| 5 | 10 | -5 | A | 2019-01-15 |
| 15 | 22 | -7 | A | 2019-02-15 |
| 15 | 20 | -5 | A | 2019-03-15 |
| 16 | 18 | -2 | A | 2019-04-15 |
+---------+--------+-------+------------+-------------+
51 70 -19
When I create a SUM of the deltas, I should get the same values as the last row from results in 1).
Notes towards the solution:
use account_balance - LAG(account_balance) OVER(PARTITION BY department ORDER BY timestamp ASC) delta to compute deltas
Your question is unclear, but it sounds like you want to get the outstanding balance at any given point in time.
The following query does this for 1 point in time.
with calendar as (
select cast('2019-06-01' as timestamp) as balance_calc_ts
),
most_recent_balance as (
select customer_id, balance_calc_ts,max(timestamp) as most_recent_balance_ts
from <table>
cross join calendar
where timestamp < balance_calc_ts -- or <=
group by 1,2
)
select t.customer_id, t.account_balance, mrb.balance_calc_ts
from <table> t
inner join most_recent_balance mrb on t.customer_id = mrb.customer_id and t.timestamp = mrb.balance_calc_ts
If you need to calculate it at a series of points in time, you will need to modify the calendar CTE to return more dates. This is the beauty of CROSS JOINS in BQ!

SQL query to select today and previous day's price

I have historic stock price data that looks like the below. I want to generate a new table that has one row for each ticker with the most recent day's price and its previous day's price. What would be the best way to do this? My database is Postgres.
+---------+------------+------------+
| ticker | price | date |
+---------+------------+------------|
| AAPL | 6 | 10-23-2015 |
| AAPL | 5 | 10-22-2015 |
| AAPL | 4 | 10-21-2015 |
| AXP | 5 | 10-23-2015 |
| AXP | 3 | 10-22-2015 |
| AXP | 5 | 10-21-2015 |
+------- +-------------+------------+
You can do something like this:
with ranking as (
select ticker, price, dt,
rank() over (partition by ticker order by dt desc) as rank
from stocks
)
select * from ranking where rank in (1,2);
Example: http://sqlfiddle.com/#!15/e45ea/3
Results for your example will look like this:
| ticker | price | dt | rank |
|--------|-------|---------------------------|------|
| AAPL | 6 | October, 23 2015 00:00:00 | 1 |
| AAPL | 5 | October, 22 2015 00:00:00 | 2 |
| AXP | 5 | October, 23 2015 00:00:00 | 1 |
| AXP | 3 | October, 22 2015 00:00:00 | 2 |
If your table is large and have performance issues, use a where to restrict the data to last 30 days or so.
Best bet is to use a window function with an aggregated case statement which is used to create a pivot on the data.
You can see more on window functions here: http://www.postgresql.org/docs/current/static/tutorial-window.html
Below is a pseudo code version of where you may need to head to answer your question (sorry I couldn't validate it due to not have a postgres database setup).
Select
ticker,
SUM(CASE WHEN rank = 1 THEN price ELSE 0 END) today,
SUM(CASE WHEN rank = 2 THEN price ELSE 0 END) yesterday
FROM (
SELECT
ticker,
price,
date,
rank() OVER (PARTITION BY ticker ORDER BY date DESC) as rank
FROM your_table) p
WHERE rank in (1,2)
GROUP BY ticker.
Edit - Updated the case statement with an 'else'

SQL Query in MS ACCESS to calculate averaged days depending on multiple criteria

I have the following table in MS Access 2007:
customer | Promotion | Month | activator | request_date | activation_date
1 | promo1 | 10 | shop1 | 11/10/2011 | 21/10/2011
2 | promo2 | 9 | shop1 | 10/09/2011 | 15/09/2011
3 | promo2 | 9 | shop2 | 10/09/2011 | 16/09/2011
4 | promo1 | 10 | shop1 | 12/10/2011 | 13/10/2011
What I need is a query to calculate the average number of days that each shop takes to activate each promotion grouped by month. So for example one result would be:
shop1 in October took an average of 10+1/2 days to activate promo1.
Thanks in advance!
SELECT activator, Month, Promotion, AVG(activation_date - request_date)
FROM ...
GROUP BY activator, Month, Promotion
Try this:
select
activator,
[month],
promotion,
avg(convert( float, datediff(DAY, request_date,activation_date))) as avgTime
from dbo.Table1
group by activator,[month], promotion

SQL Query question (count dates for each month)

Hope you can help me with the following, i have the following view availble:
DD/MM/YYYY
ENTITY | StartDate | EndDate | CodeA | CodeB | Revenue | Currency
AZERT | 01/01/2011 | 02/01/2011 | SU | BOLD | 100 | EUR
AZERT | 28/01/2011 | 02/02/2011 | SU | BOLD | 500 | EUR
Can someone help with a query to pull the data so that I get the following summed?
ENTITY | YYYY.MM | CodeA | CodeB | DAYS | TIMES | Revenue | Currency
AZERT | 2011.01 | SU | BOD | 5 | 2 | 500 | EUR
AZERT | 2011.02 | SU | BOD | 1 | 0 | 100 | EUR
Where YYYY.MM is created depending on the difference between Sdate and EDate.
And DAYS is the variance between the start and end day in the right month
And TIMES is the number of times that the StartDate occurs in that month
Revenue splitted depening how many days there are.
Assuming you are using SQL Server 2005 or later, would this work?
SELECT
DATEADD(dd, -DAY(EndDate + 1,EndDate)) -- Get first day of month for EndDate
,CodeA
,CodeB
,SUM(DATEDIFF(dd,StartDate, EndDate)) AS 'DAYS'
FROM
TABLE1
GROUP BY
DATEADD(dd, -DAY(EndDate + 1,EndDate))
,CodeA
,CodeB