Addition in SQL by month - sql

I need to SUM column something by month:
date something
2010-01-02
2010-01-03
2010-01-04
2010-01-07
2010-01-10
2010-01-12
2010-01-13
2010-01-14
2010-01-15
2010-01-16
2010-01-17
2010-01-18 3
2010-01-19 1
2010-01-21
2010-01-22 11
2010-01-23 1
2010-01-24
2010-01-25
2010-01-26
2010-01-27
2010-01-28
2010-01-29
2010-01-30
2010-01-05 5
2010-01-06 8
2010-01-09
2010-01-08 3
2010-01-11
2010-01-01
2010-01-20 0
2010-01-31 13
Output should be e.g. for JAN 2010 SUM OF SOMETHING 45:
date something
2010-01 45
How to write SQL query for that?

This is a simple aggregation based on the month of the date column:
select to_char("date", 'yyyy-mm'), sum(something)
from the_table
group by to_char("date", 'yyyy-mm')
This assumes the column date has the data type date (or timestamp)

Related

SQL Server query Group By Trimester

I'm finding a way to group SQL query by trimester. I have found a way to do it using MySQL on this link.
This is what I'm expecting:
Range Start Range End Count
----------- ---------- -----
2013-09-01 2013-11-26 87
2013-06-01 2013-08-31 92
2013-03-01 2013-05-31 92
2012-12-01 2013-02-28 90
2012-09-01 2012-11-30 91
This is what I have tried:
SELECT MIN(start_date) AS Range_Start, MAX(start_date) AS Range_End, COUNT(ID) AS Total
FROM [dbo].[table]
GROUP BY FLOOR(DATEDIFF(MONTH, DATEADD(DAY, -DAY(start_date)+1, start_date), DATEADD(DAY, -DAY(start_date)+1,getdate())) /3)
ORDER BY 1 ASC
This is what I get:
Range Start Range End Count
----------- ---------- -----
1900-01-01 00:00:00.000 1900-01-01 00:00:00.000 8
1952-01-01 00:00:00.000 1952-01-01 00:00:00.000 2
1954-01-01 00:00:00.000 1954-01-01 00:00:00.000 11
1955-01-01 00:00:00.000 1955-01-01 00:00:00.000 3
1956-01-01 00:00:00.000 1956-01-01 00:00:00.000 2
1957-01-01 00:00:00.000 1957-01-01 00:00:00.000 8
1958-01-01 00:00:00.000 1958-01-01 00:00:00.000 2
1959-01-01 00:00:00.000 1959-01-01 00:00:00.000 5
1960-01-01 00:00:00.000 1960-01-01 00:00:00.000 17
1960-03-17 00:00:00.000 1960-03-17 00:00:00.000 1

Select data between 2 datetime fields based on current date/time

I have a table that has the following values (reduced for brevity)
Period
Periodfrom
Periodto
Glperiodoracle
Glperiodcalendar
88
2022-01-01 00:00:00
2022-01-28 00:00:00
JAN-FY2022
JAN-2022
89
2022-01-29 00:00:00
2022-02-25 00:00:00
FEB-FY2022
FEB-2022
90
2022-02-26 00:00:00
2022-04-01 00:00:00
MAR-FY2022
MAR-2022
91
2022-04-02 00:00:00
2022-04-29 00:00:00
APR-FY2022
APR-2022
92
2022-04-30 00:00:00
2022-05-27 00:00:00
MAY-FY2022
MAY-2022
93
2022-05-28 00:00:00
2022-07-01 00:00:00
JUN-FY2022
JUN-2022
94
2022-07-02 00:00:00
2022-07-29 00:00:00
JUL-FY2022
JUL-2022
95
2022-07-30 00:00:00
2022-08-26 00:00:00
AUG-FY2022
AUG-2022
96
2022-08-27 00:00:00
2022-09-30 00:00:00
SEP-FY2022
SEP-2022
97
2022-10-01 00:00:00
2022-10-28 00:00:00
OCT-FY2023
OCT-2022
I want to make a stored procedure that when executed (without receiving parameters) will return the single row corresponding to the date between PeriodFrom and PeriodTo based on execution date.
I have something like this:
Select top 1 Period,
Periodfrom,
Periodto,
Glperiodoracle,
Glperiodcalendar
From Calendar_Period
Where Periodfrom <= getdate()
And Periodto >= getdate()
I understand that using BETWEEN could lead to errors, but would this work in the edge cases taking in account seconds, right?
Looks like (i) your end date is inclusive (ii) the time portion is always 00:00. So the correct and most performant query would be:
where cast(getdate() as date) between Periodfrom and Periodto
It will, for example, return the first row when the current time is 2022-01-28 23:59:59.999.

Resample 10D but until end of months

I would like to resample a DataFrame with frequences of 10D but cutting the last decade always at the end of the month.
ES:
print(df)
 data
index
2010-01-01 145.08
2010-01-02 143.69
2010-01-03 101.06
2010-01-04 57.63
2010-01-05 65.46
...
2010-02-24 48.06
2010-02-25 87.41
2010-02-26 71.97
2010-02-27 73.1
2010-02-28 41.43
Apply something like df.resample('10DM').mean()
data
index
2010-01-10 97.33
2010-01-20 58.58
2010-01-31 41.43
2010-02-10 35.17
2010-02-20 32.44
2010-02-28 55.44
note that the 1st and 2nd decades are normal 10D resample, but the 3rd can be 8-9-10-11 days based on month and year.
Thanks in advance.
Sample data (easy to check):
# df = pd.DataFrame({"value": np.arange(1, len(dti)+1)}, index=dti)
>>> df
value
2010-01-01 1
2010-01-02 2
2010-01-03 3
2010-01-04 4
2010-01-05 5
...
2010-02-24 55
2010-02-25 56
2010-02-26 57
2010-02-27 58
2010-02-28 59
You need to create groups by (days, month, year):
grp = df.groupby([pd.cut(df.index.day, [0, 10, 20, 31]),
pd.Grouper(freq='M'),
pd.Grouper(freq='Y')])
Now you can compute the mean for each group:
out = grp['value'].apply(lambda x: (x.index.max(), x.mean())).apply(pd.Series) \
.reset_index(drop=True).rename(columns={0:'date', 1:'value'}) \
.set_index('date').sort_index()
Output result:
>>> out
value
date
2010-01-10 5.5
2010-01-20 15.5
2010-01-31 26.0
2010-02-10 36.5
2010-02-20 46.5
2010-02-28 55.5

How to convert to datetime if the format of dates changes gradually through the column?

df.head():
start_date end_date
0 03.09.2013 03.09.2025
1 09.08.2019 14.05.2020
2 03.08.2015 03.08.2019
3 31.03.2014 31.03.2019
4 02.02.2015 02.02.2019
5 21.08.2019 21.08.2024
when I do df.tail():
start_date end_date
30373 2019-07-05 00:00:00 2023-07-05 00:00:00
30374 2019-06-11 00:00:00 2023-06-11 00:00:00
30375 19.01.2017 2020-02-09 00:00:00 #these 2 start dates are just same as in head
30376 11.12.2009 2011-12-11 00:00:00
30377 2019-07-30 00:00:00 2023-07-30 00:00:00
when i do
df[start_date] = pd.to_datetime(df[start_date])
some dates have month converted as days.
The format is inconsistent through the column. How to convert properly?
Use dayfirst=True parameter:
df['start_date'] = pd.to_datetime(df['start_date'], dayfirst=True)
Or specify format by http://strftime.org/:
df['start_date'] = pd.to_datetime(df['start_date'], format='%d.%m.%Y')
df['start_date'] = pd.to_datetime(df['start_date'], dayfirst=True)
df['end_date'] = pd.to_datetime(df['end_date'], dayfirst=True)
print (df)
start_date end_date
0 2013-09-03 2025-09-03
1 2019-08-09 2020-05-14
2 2015-08-03 2019-08-03
3 2014-03-31 2019-03-31
4 2015-02-02 2019-02-02
5 2019-08-21 2024-08-21

joining monthly values with daily values in sql

I have daily values in one table and monthly values in another table. I need to use the values of the monthly table and calculate them on a daily basis.
basically, monthly factor * daily factor -- for each day
thanks!
I have a table like this:
2010-12-31 00:00:00.000 28.3
2010-09-30 00:00:00.000 64.1
2010-06-30 00:00:00.000 66.15
2010-03-31 00:00:00.000 12.54
and a table like this :
2010-12-31 00:00:00.000 98.1
2010-12-30 00:00:00.000 97.61
2010-12-29 00:00:00.000 99.03
2010-12-28 00:00:00.000 97.7
2010-12-27 00:00:00.000 96.87
2010-12-23 00:00:00.000 97.44
2010-12-22 00:00:00.000 97.76
2010-12-21 00:00:00.000 96.63
2010-12-20 00:00:00.000 95.47
2010-12-17 00:00:00.000 95.2
2010-12-16 00:00:00.000 94.84
2010-12-15 00:00:00.000 94.8
2010-12-14 00:00:00.000 94.1
2010-12-13 00:00:00.000 93.88
2010-12-10 00:00:00.000 93.04
2010-12-09 00:00:00.000 91.07
2010-12-08 00:00:00.000 90.89
2010-12-07 00:00:00.000 92.72
2010-12-06 00:00:00.000 93.05
2010-12-03 00:00:00.000 91.74
2010-12-02 00:00:00.000 90.74
2010-12-01 00:00:00.000 90.25
I need to take the value for the quarter and multiply it buy all the days in the quarter by the daily value
You could try:
SELECT dt.day, dt.factor*mt.factor AS daily_factor
FROM daily_table dt INNER JOIN month_table mt
ON YEAR(dt.day) = YEAR(mt.day)
AND FLOOR((MONTH(dt.day)-1)/3) = FLOOR((MONTH(mt.day)-1)/3)
ORDER BY dt.day
or (as suggested by #Andriy)
SELECT dt.day, dt.factor*mt.factor AS daily_factor
FROM daily_table dt INNER JOIN month_table mt
ON YEAR(dt.day) = YEAR(mt.day)
AND DATEPART(QUARTER, dt.day) = DATEPART(QUARTER, mt.day)
ORDER BY dt.day