SQL: Finding the Average per month - sql

I have 2 columns. DATE and DFF (it is a FLOAT for a interest rate. I need to get the average interest rate for every month.
EXAMPLE:
DATE DFF
2003-01-04 2.0
2003-01-18 2.25
2003-01-25 2.5
2003-02-08 3.0
2003-02-15 3.25
2003-02-27 2.75
2004-05-07 4.0
2004-05-25 4.0
Outcome I am looking for is...
EXAMPLE:
date avg_dff
2003-01 2.25
2003-02 3.0
2004-05 4.0

This query should solve your problem:
SELECT
DATE_FORMAT(DATE, '%Y-%m') AS date,
AVG(DFF) AS dff
FROM your_table
GROUP BY DATE_FORMAT(DATE, '%Y-%m')

Related

Averaging the data across two calendar years and defining the beginning month

I have a data for a period from December 2013 to November 2018. I converted it into a data frame as shown here.
Date 0.1 0.2 0.3 0.4 0.5 0.6
2013-12-01 301.04 297.4 296.63 295.76 295.25 295.25
2013-12-04 297.96 297.15 296.25 295.25 294.43 293.45
2013-12-05 298.4 297.61 296.65 295.81 294.75 293.89
2013-12-08 298.82 297.95 297.15 296.25 295.45 294.41
2013-12-09 298.65 297.65 296.95 296.02 295.13 294.05
2013-12-12 299.05 297.33 296.65 295.81 294.85 293.85
2013-12-16 301.05 300.28 299.38 298.45 297.65 296.51
....
2014-01-10 301.65 297.45 296.46 295.52 294.65 293.56
2014-01-11 301.99 298.95 298.39 297.15 296.05 295.11
2014-01-12 299.86 298.65 297.73 296.82 296.35 295.37
2014-01-13 299.25 298.15 297.3 296.43 295.26 294.31
I want to take monthly mean and seasonal mean of this data.
For monthly mean I have tried
df.resample('M').mean()
And it worked well.
For seasons, I would like decompose this data into 4 seasons (December-Feb; Mar-May; June-Aug; and Sep-Nov) of three months interval. While I tried the resample with 3 months interval. i.e.
df.resample('3M').mean()
However this is not worked well as it giving the average for the starting December month separately and then considering the above said interval for a calendar year (ie. from January to March and so on).
I would like to know if there are any possible ways to avoid this by specifying which month is our period of consideration begins.
Moreover, I would also like to know whether we can define these seasons beforehand and group the data accordingly to get averages with more ease.
You can define the origin in resample:
df.resample('M', origin=pd.Timestamp('2013-12-01')).mean()

Find the week of month in Impala

I want to print in impala that 2020-03-01 is in the first week of March.
How is this possible in Impala? I managed only to find the weekofyear().
If by "week of month" you mean that the first week is days 1-7, the second 8-14, and so on, then you can use:
select ceiling( day(ingestion_date) / 7.0 ) as week_of_month
I figured it out,
ceil(cast(substr(ingestion_date,7,2) as int)/7) as wmonth
The steps I did:
substring the date to take only the date
convert it to int
divide by 7
ceil the result

Increment value depending on months count -SQL that isn't working

Employee annual leave limit per years is 30. Per month 2.5 is added as a annual leave balance based on his joining date.
For example say employee completed 3 months his annual leave balance count should be 7.5.
1st month - 2.5
2nd month - 5
3rd month - 7.5
How to achieve this using sql / .net
Regards,
select (sysdate - job_start_date) / 30 * 2.5 anlBalance from myEmpTable
and to round up the balance you can use ceil and to round down floor
select ceil((sysdate - job_start_date) / 30 * 2.5) anlBalance from myEmpTable
select floor((sysdate - job_start_date) / 30 * 2.5) anlBalance from myEmpTable

MDX - Average over the whole time period, even when no data exists

I have a fact table with 1 row for each day. Example:
ID Date Value
1 20190101 10
1 20190102 15
2 20190101 31
If I take a simple Value average in SSAS cube I get:
ID Average <Formula>
1 12.5 (10+15)/2
2 15.5 31/2
As I understand, 15.5 is there because in total there are 2 days in the scope as only two days exist in the fact data when I select the whole month.
However, I need to calculate a monthly average instead. It should check that there are 31 days in that month (based on Date dimension) and get this result:
ID Average <Formula>
1 0.8 (10+15)/31
2 1 31/31
So far I've tried to create some "fake rows" if my data, for example I've tried to create rows with Value = 0 for dates 20190103-20190131 for ID=1.
This works, it forces the calculation for ID=1 to always take all days in the period, but it messes up my other calculations in the cube.
Any other ways to force average calculation in SSAS multidimensional cube to always calculate for the entire month?
If you want to do the calculation in the Cube, you can use the Descendants function on your Date dimension
For eg., the following gives the number of days in a month using the AdventureWorks sample
WITH MEMBER Measures.DayCount AS
Descendants
(
[Date].[Calendar].CurrentMember,
[Date].[Calendar].[Date],
LEAVES
).Count
SELECT [Measures].[DayCount] ON 0,
[Date].[Calendar].[Month].ALLMEMBERS ON 1
FROM [Adventure Works]
I would recommend:
select id, eomonth(date) as eom,
sum(value) * 1.0 / day(eomonth(date)) as average
from t
group by id, eomonth(date);
EOMONTH() returns the last day of the month. You can extract the day to get the number of days in the month.
The * 1.0is because SQL Server does integer division. Your numbers look like integers, but if you are getting 15.5, then you actually have numerics or something other than an integer.

SQLite: Get monthly, weekly or daily average of all entries

I have a SQLite database of records in this format:
date location temperature
1568463916 room 1 20.0
1568463916 room 2 25.0
1568463916 room 3 30.0
...
1568460316 room 1 15.5
1568460316 room 2 20.5
1568460316 room 3 21.3
Every hour three new records get inserted, one for every room.
For a monthly average this output is desired:
month avg_temperature location
01 21.333 room 1
01 24.5 room 2
01 19.0 room 3
...
12 20.4 room 1
12 31.31 room 2
12 13.37 room 3
The same query might be reused to get weekly averages (day 00-07) and daily averages (hour 00-23).
To get a monthly average, I'm assuming I will select:
All records with date between now and now - 1 year
Month of every record with strftime(date, "unixepoch") as month
For every location, then for every month get avg(temperature)
The result is rooms*12 rows of average temperature of each room for each month
When I'm using the GROUP BY statement however, I'm only getting the last row of every month. What's the correct way to construct this kind of query?
This is the query I've tried:
SELECT strftime("%m", date, "unixepoch") month,
avg(temperature) avg_temperature,
location
FROM table
WHERE date > date("now", "unixepoch", "-1 year")
AND date < date("now", "unixepoch")
GROUP BY location, month
ORDER BY month
This should do what you want:
select date(datetime(date, 'unixepoch'), 'start of month') as month,
location,
avg(temperature)
from t
group by date(datetime(date, 'unixepoch'), 'start of month') as month,
location
order by month, location;