How do I group by month when I have data in a time range, accurate up to the second? - sql

I'd like to ask if there's a way to group my data by months in this case:
I have table of orders, with order Ids in a column and the dates the orders were created in another.
For example,
orderId | creationDate
58111 | 2017-01-01 00:00:00
58111 | 2017-01-12 00:00:00
58232 | 2017-01-31 00:00:00
62882 | 2017-02-21 00:00:00
90299 | 2017-03-20 00:00:00
I need to find the number of unique orderIds, grouped by month. Normally this would be simple, but with my creationDates accurate to the second, I have no idea how to segment them into months. Ideally, this is what I'd obtain:
creationMonth | count_orderId
January | 2
February | 1
March | 1

Try this:
select count( distinct orderId ), year( creationDate ), month( creationDate )
from my_table group by year( creationDate ), month( creationDate )

Related

SQL select query using Joins with aggregate counts

I have a table with the following fields:
tickets: id, createddate, resolutiondate
A sample set of data has:
jira=# select * from tickets;
id | createddate | resolutiondate
---------+-------------+----------------
ticket1 | 2020-09-21 | 2020-10-01
ticket2 | 2020-09-22 | 2020-09-23
ticket3 | 2020-10-01 |
ticket4 | 2020-10-01 | 2020-10-04
ticket5 | 2020-10-01 |
ticket6 | 2020-10-01 | 2020-10-07
(6 rows)
jira=#
I would like to create a query which reports:
Week: Issues Created: Issues Resolved
I can do the two separate queries:
# select date_trunc('week', createddate) week, count(id) created
from tickets
group by week
order by week desc
;
week | created
------------------------+---------
2020-09-28 00:00:00+00 | 4
2020-09-21 00:00:00+00 | 2
(2 rows)
# select date_trunc('week', resolutiondate) week, count(id) resolved
from tickets
where resolutiondate is not NULL
group by week
order by week desc
;
week | resolved
------------------------+----------
2020-10-05 00:00:00+00 | 1
2020-09-28 00:00:00+00 | 2
2020-09-21 00:00:00+00 | 1
(3 rows)
However - I can not figure out how (with a join, union, sub-query, ...?) to combine these queries into a combined query with the appropriate aggregations.
I'm doing this is Postgres - any pointers would be appreciated.
Performing a union before aggregating values may work here eg
select week,
count(id_created) as created,
count(id_resolved) as resolved
from (
select date_trunc('week', resolutiondate) week, NULL as id_created, id as id_resolved from tickets UNION ALL
select date_trunc('week', createddate) week, id as id_created, NULL as id_resolved from tickets
) t
group by week
order by week desc
Let me know if this works for you.

How can I aggregate values based on an arbitrary monthly cycle date range in SQL?

Given a table as such:
# SELECT * FROM payments ORDER BY payment_date DESC;
id | payment_type_id | payment_date | amount
----+-----------------+--------------+---------
4 | 1 | 2019-11-18 | 300.00
3 | 1 | 2019-11-17 | 1000.00
2 | 1 | 2019-11-16 | 250.00
1 | 1 | 2019-11-15 | 300.00
14 | 1 | 2019-10-18 | 130.00
13 | 1 | 2019-10-18 | 100.00
15 | 1 | 2019-09-18 | 1300.00
16 | 1 | 2019-09-17 | 1300.00
17 | 1 | 2019-09-01 | 400.00
18 | 1 | 2019-08-25 | 400.00
(10 rows)
How can I SUM the amount column based on an arbitrary date range, not simply a date truncation?
Taking the example of a date range beginning on the 15th of a month, and ending on the 14th of the following month, the output I would expect to see is:
payment_type_id | payment_date | amount
-----------------+--------------+---------
1 | 2019-11-15 | 1850.00
1 | 2019-10-15 | 230.00
1 | 2019-09-15 | 2600.00
1 | 2019-08-15 | 800.00
Can this be done in SQL, or is this something that's better handled in code? I would traditionally do this in code, but looking to extend my knowledge of SQL (which at this stage, isnt much!)
Click demo:db<>fiddle
You can use a combination of the CASE clause and the date_trunc() function:
SELECT
payment_type_id,
CASE
WHEN date_part('day', payment_date) < 15 THEN
date_trunc('month', payment_date) + interval '-1month 14 days'
ELSE date_trunc('month', payment_date) + interval '14 days'
END AS payment_date,
SUM(amount) AS amount
FROM
payments
GROUP BY 1,2
date_part('day', ...) gives out the current day of month
The CASE clause is for dividing the dates before the 15th of month and after.
The date_trunc('month', ...) converts all dates in a month to the first of this month
So, if date is before the 15th of the current month, it should be grouped to the 15th of the previous month (this is what +interval '-1month 14 days' calculates: +14, because the date_trunc() truncates to the 1st of month: 1 + 14 = 15). Otherwise it is group to the 15th of the current month.
After calculating these payment_days, you can use them for simple grouping.
I would simply subtract 14 days, truncate the month, and add 14 days back:
select payment_type_id,
date_trunc('month', payment_date - interval '14 day') + interval '14 day' as month_15,
sum(amount)
from payments
group by payment_type_id, month_15
order by payment_type_id, month_15;
No conditional logic is actually needed for this.
Here is a db<>fiddle.
You can use the generate_series() function and make a inner join comparing month and year, like this:
SELECT specific_date_on_month, SUM(amount)
FROM (SELECT generate_series('2015-01-15'::date, '2015-12-15'::date, '1 month'::interval) AS specific_date_on_month)
INNER JOIN payments
ON (TO_CHAR(payment_date, 'yyyymm')=TO_CHAR(specific_date_on_month, 'yyyymm'))
GROUP BY specific_date_on_month;
The generate_series(<begin>, <end>, <interval>) function generate a serie based on begin and end with an specific interval.

How to write a SQL statement to sum data using group by the same day of every two neighboring months

I have a data table like this:
datetime data
-----------------------
...
2017/8/24 6.0
2017/8/25 5.0
...
2017/9/24 6.0
2017/9/25 6.2
...
2017/10/24 8.1
2017/10/25 8.2
I want to write a SQL statement to sum the data using group by the 24th of every two neighboring months in certain range of time such as : from 2017/7/20 to 2017/10/25 as above.
How to write this SQL statement? I'm using SQL Server 2008 R2.
The expected results table is like this:
datetime_range data_sum
------------------------------------
...
2017/8/24~2017/9/24 100.9
2017/9/24~2017/10/24 120.2
...
One conceptual way to proceed here is to redefine a "month" as ending on the 24th of each normal month. Using the SQL Server month function, we will assign any date occurring after the 24th as belonging to the next month. Then we can aggregate by the year along with this shifted month to obtain the sum of data.
WITH cte AS (
SELECT
data,
YEAR(datetime) AS year,
CASE WHEN DAY(datetime) > 24
THEN MONTH(datetime) + 1 ELSE MONTH(datetime) END AS month
FROM yourTable
)
SELECT
CONVERT(varchar(4), year) + '/' + CONVERT(varchar(2), month) +
'/25~' +
CONVERT(varchar(4), year) + '/' + CONVERT(varchar(2), (month + 1)) +
'/24' AS datetime_range,
SUM(data) AS data_sum
FROM cte
GROUP BY
year, month;
Note that your suggested ranges seem to include the 24th on both ends, which does not make sense from an accounting point of view. I assume that the month includes and ends on the 24th (i.e. the 25th is the first day of the next accounting period.
Demo
I would suggest dynamically building some date range rows so that you can then join you data to those for aggregation, like this example:
+----+---------------------+---------------------+----------------+
| | period_start_dt | period_end_dt | your_data_here |
+----+---------------------+---------------------+----------------+
| 1 | 24.04.2017 00:00:00 | 24.05.2017 00:00:00 | 1 |
| 2 | 24.05.2017 00:00:00 | 24.06.2017 00:00:00 | 1 |
| 3 | 24.06.2017 00:00:00 | 24.07.2017 00:00:00 | 1 |
| 4 | 24.07.2017 00:00:00 | 24.08.2017 00:00:00 | 1 |
| 5 | 24.08.2017 00:00:00 | 24.09.2017 00:00:00 | 1 |
| 6 | 24.09.2017 00:00:00 | 24.10.2017 00:00:00 | 1 |
| 7 | 24.10.2017 00:00:00 | 24.11.2017 00:00:00 | 1 |
| 8 | 24.11.2017 00:00:00 | 24.12.2017 00:00:00 | 1 |
| 9 | 24.12.2017 00:00:00 | 24.01.2018 00:00:00 | 1 |
| 10 | 24.01.2018 00:00:00 | 24.02.2018 00:00:00 | 1 |
| 11 | 24.02.2018 00:00:00 | 24.03.2018 00:00:00 | 1 |
| 12 | 24.03.2018 00:00:00 | 24.04.2018 00:00:00 | 1 |
+----+---------------------+---------------------+----------------+
DEMO
declare #start_dt date;
set #start_dt = '20170424';
select
period_start_dt, period_end_dt, sum(1) as your_data_here
from (
select
dateadd(month,m.n,start_dt) period_start_dt
, dateadd(month,m.n+1,start_dt) period_end_dt
from (
select #start_dt start_dt ) seed
cross join (
select 0 n union all
select 1 union all
select 2 union all
select 3 union all
select 4 union all
select 5 union all
select 6 union all
select 7 union all
select 8 union all
select 9 union all
select 10 union all
select 11
) m
) r
-- LEFT JOIN YOUR DATA
-- ON yourdata.date >= r.period_start_dt and data.date < r.period_end_dt
group by
period_start_dt, period_end_dt
Please don't be tempted to use "between" when it comes to joining to your data. Follow the note above and use yourdata.date >= r.period_start_dt and data.date < r.period_end_dt otherwise you could double count information as between is inclusive of both lower and upper boundaries.
I think the simplest way is to subtract 25 days and aggregate by the month:
select year(dateadd(day, -25, datetime)) as yr,
month(dateadd(day, -25, datetime)) as mon,
sum(data)
from t
group by dateadd(day, -25, datetime);
You can format yr and mon to get the dates for the specific ranges, but this does the aggregation (and the yr/mon columns might be sufficient).
Step 0: Build a calendar table. Every database needs a calendar table eventually to simplify this sort of calculation.
In this table you may have columns such as:
Date (primary key)
Day
Month
Year
Quarter
Half-year (e.g. 1 or 2)
Day of year (1 to 366)
Day of week (numeric or text)
Is weekend (seems redundant now, but is a huge time saver later on)
Fiscal quarter/year (if your company's fiscal year doesn't start on Jan. 1)
Is Holiday
etc.
If your company starts its month on the 24th, then you can add a "Fiscal Month" column that represents that.
Step 1: Join on the calendar table
Step 2: Group by the columns in the calendar table.
Calendar tables sound weird at first, but once you realize that they are in fact tiny even if they span a couple hundred years they quickly become a major asset.
Don't try to cheap out on disk space by using computed columns. You want real columns because they are much faster and can be indexed if necessary. (Though honestly, usually just the PK index is enough for even wide calendar tables.)

How to identify MIN value for records within a rolling date range in SQL

I am trying to calculate a MIN date by Patient_ID for each record in my dataset that dynamically references the last 30 days from the date (Discharge_Dt) on that row. My initial thought was to use a window function, but I opted for a subquery, which is close, but not quite what I need.
Please note, my sample query is also missing logic that limits the MIN Discharge_Dt to the last 30 days, in other words, I do not want a MIN Discharge_Dt that is older than 30 days for any given row.
Sample Query:
SELECT Patient_ID,
Discharge_Dt,
/* Calculating the MIN Discharge_Dt by Patient_ID for the last 30
days based upon the Discharge_Dt for that row */
(SELECT MIN(Discharge_Dt)
FROM admissions_ds AS b
WHERE a.Patient_ID = b.Patient_ID AND
a.Discharge_Dt >= DATEADD('D', -30, GETDATE())) AS MIN_Dt
FROM admissions_ds AS a
Desired Output Table:
Patient_ID | Discharge_Dt | MIN_Dt
10 | 2017-08-15 | 2017-08-15
10 | 2017-08-31 | 2017-08-15
10 | 2017-09-21 | 2017-08-31
15 | 2017-07-01 | 2017-07-01
15 | 2017-07-18 | 2017-07-01
20 | 2017-05-05 | 2017-05-05
25 | 2017-09-24 | 2017-09-24
Here you go,
Just a simple join required.
drop TABLE if EXISTS admissions_ds;
create table admissions_ds (Patient_ID int,Discharge_Dt date);
insert into admissions_ds
values
(10,'2017-08-15'),
(10,'2017-08-31'),
(10,'2017-09-21'),
(15,'2017-07-01'),
(15,'2017-07-18'),
(20,'2017-05-05'),
(25,'2017-09-24');
select t1.Patient_ID,t1.Discharge_Dt,min(t2.Discharge_Dt) as min_dt
from admissions_ds as t1
join admissions_ds as t2 on t1.Patient_ID=t2.Patient_ID and t2.Discharge_Dt > t1.Discharge_Dt - interval '30 days'
group by 1,2
order by 1,2
;

SQL Finding the first day of the month of a data set

I've found lots of code in getting the sql to display the first of a month, but I need to display the first day of the month based on my data set not just [month] 1st [year]
EX 1: January 1st is a holiday, so it'll never be the first day of the month in the data set, the first day of January is January 2nd.
Another example is if the first date of the month is the 7th in my data set, I want to see the 7th not the 1st.
This is my data set
DATE
----------
2016-02-01
2016-02-05
2016-02-08
2016-02-19
2016-02-20
2016-02-22
2016-05-02
2016-05-05
2016-05-07
2016-05-09
2016-05-11
2016-05-23
2016-06-01
2016-06-10
2016-06-20
2016-06-29
2016-07-01
2016-07-07
2016-07-14
2016-07-21
2016-07-28
2016-07-31
2016-08-04
2016-08-10
2016-08-18
2017-02-23
2017-02-30
I need this to display
DATE
----------
2016-02-01
2016-05-02
2016-06-01
2016-07-01
2016-08-04
2017-02-23
I keep getting stuck, I thought this may work but I'm not getting the min date for that month
select min(load_date) from multi_dt
group by month(load_date)
Try this:
select min(load_date) as min_load_date
from multi_dt
group by dateadd(month, datediff(month, 0, load_date ) , 0)
Using month() only returns the month, but using the function in the query above will return the first of the month, but as a datetime datatype so when you group by it, it is including the year and the month.
rextester demo: http://rextester.com/UJRN68337
returns:
+---------------+
| min_load_date |
+---------------+
| 2016-02-01 |
| 2016-05-02 |
| 2016-06-01 |
| 2016-07-01 |
| 2016-08-04 |
| 2017-02-23 |
+---------------+
Your initial answer was fine, you just also needed to group by the year.
group by
month(load_date),year(load_date)
I would use row_number():
select t.date
from (select t.*,
row_number() over (partition by year(date), month(date) order by date) as seqnum
from t
) t
where seqnum = 1;
If you don't need any additional columns, an aggregation would be equivalent:
select min(t.date)
from t
group by year(t.date), month(t.date);