Querying all past and future round birthdays - sql

I got the birthdates of users in a table and want to display a list of round birthdays for the next n years (starting from an arbitrary date x) which looks like this:
+----------------------------------------------------------------------------------------+
| Name | id | birthdate | current_age | birthday | year | month | day | age_at_date |
+----------------------------------------------------------------------------------------+
| User 1 | 1 | 1958-01-23 | 59 | 2013-01-23 | 2013 | 1 | 23 | 55 |
| User 2 | 2 | 1988-01-29 | 29 | 2013-01-29 | 2013 | 1 | 29 | 25 |
| User 3 | 3 | 1963-02-12 | 54 | 2013-02-12 | 2013 | 2 | 12 | 50 |
| User 1 | 1 | 1958-01-23 | 59 | 2018-01-23 | 2018 | 1 | 23 | 60 |
| User 2 | 2 | 1988-01-29 | 29 | 2018-01-29 | 2018 | 1 | 29 | 30 |
| User 3 | 3 | 1963-02-12 | 54 | 2018-02-12 | 2018 | 2 | 12 | 55 |
| User 1 | 1 | 1958-01-23 | 59 | 2023-01-23 | 2023 | 1 | 23 | 65 |
| User 2 | 2 | 1988-01-29 | 29 | 2023-01-29 | 2023 | 1 | 29 | 35 |
| User 3 | 3 | 1963-02-12 | 54 | 2023-02-12 | 2023 | 2 | 12 | 60 |
+----------------------------------------------------------------------------------------+
As you can see, I want to be "wrap around" and not only show the next upcoming round birthday, which is easy, but also historical and far future data.
The core idea of my current approach is the following: I generate via generate_series all dates from 1900 till 2100 and join them by matching day and month of the birthdate with the user. Based on that, I calculate the age at that date to select finally only that birthdays, which are round (divideable by 5) and yield to a nonnegative age.
WITH
test_users(id, name, birthdate) AS (
VALUES
(1, 'User 1', '23-01-1958' :: DATE),
(2, 'User 2', '29-01-1988'),
(3, 'User 3', '12-02-1963')
),
dates AS (
SELECT
s AS date,
date_part('year', s) AS year,
date_part('month', s) AS month,
date_part('day', s) AS day
FROM generate_series('01-01-1900' :: TIMESTAMP, '01-01-2100' :: TIMESTAMP, '1 days' :: INTERVAL) AS s
),
birthday_data AS (
SELECT
id AS member_id,
test_users.birthdate AS birthdate,
(date_part('year', age((test_users.birthdate)))) :: INT AS current_age,
date :: DATE AS birthday,
date_part('year', date) AS year,
date_part('month', date) AS month,
date_part('day', date) AS day,
ROUND(extract(EPOCH FROM (dates.date - birthdate)) / (60 * 60 * 24 * 365)) :: INT AS age_at_date
FROM test_users, dates
WHERE
dates.day = date_part('day', birthdate) AND
dates.month = date_part('month', birthdate) AND
dates.year >= date_part('year', birthdate)
)
SELECT
test_users.name,
bd.*
FROM test_users
LEFT JOIN birthday_data bd ON bd.member_id = test_users.id
WHERE
bd.age_at_date % 5 = 0 AND
bd.birthday BETWEEN NOW() - INTERVAL '5' YEAR AND NOW() + INTERVAL '10' YEAR
ORDER BY bd.birthday;
My current approach seems to be very inefficient and rather complicated: It takes >100ms. Does anybody have an idea for a more compact and performant query? I am using Postgresql 9.5.3. Thank you!

Maybe try to join the generate series:
create table bday(id serial, name text, dob date);
insert into bday (name, dob) values ('a', '08-21-1972'::date);
insert into bday (name, dob) values ('b', '03-20-1974'::date);
select * from bday ,
lateral( select generate_series( (1950-y)/5 , (2010-y)/5)*5 + y as year
from (select date_part('year',dob)::integer as y) as t2
) as t1;
This will for each entry generate years between 1950 and 2010.
You can add a where clause to exclude people born after 2010 (they cant have a birthday in range)
Or exclude people born before 1850 (they are unlikely...)
--
Edit (after your edit):
So your generate_series creates 360+ rows per annum. In 100 years that is over 30.000. And they get joined to each user. (3 users => 100.000 rows)
My query generates only rows for years needed. In 100 years that is 20 rows.
That means 20 rows per user.
By dividing by 5, it ensures that the start date is a round birthday.
(1950-y)/5) calculates how many round birthdays there were before 1950.
A person born in 1941 needs to skip 1941 and 1946, but has a round birthday in 1951. So that is the difference (9 years) divided by 5, and then actually plus 1 to account for the 0st.
If the person is born after 1950 the number is negative, and greatest(-1,...)+1 gives 0, starting at the actual birthday year.
But actually it should be
select * from bday ,
lateral( select generate_series( greatest(-1,(1950-y)/5)+1, (2010-y)/5)*5 + y as year
from (select date_part('year',dob)::integer as y) as t2
) as t1;
(you may be doing greatest(0,...)+1 if you want to start at age 5)

Related

How can I insert data in table form into another table provided some specific conditions are satisfied

Logic: If today is Monday (reference 'time' table), data present in S should be inserted into M (along with a sent_day column which will have today's date).
If today is not Monday, dates corresponding to current week (unique week_id) should be checked in M table. If any of these dates are available in M then S should not be inserted into M. If these dates are not available in M then S should be inserted into M
time
+------------+------------+----------------+
| cal_dt | cal_day | week_id |
+------------+------------+----------------+
| 2020-03-23 | Monday | 123 |
| 2020-03-24 | Tuesday | 123 |
| 2020-03-25 | Wednesday | 123 |
| 2020-03-26 | Thursday | 123 |
| 2020-03-27 | Friday | 123 |
| 2020-03-30 | Monday | 124 |
| 2020-03-31 | Tueday | 124 |
+------------+------------+----------------+
M
+------------+----------+-------+
| sent_day | item | price |
+------------+----------+-------+
| 2020-03-11 | pen | 10 |
| 2020-03-11 | book | 50 |
| 2020-03-13 | Eraser | 5 |
| 2020-03-13 | sharpner | 5 |
+------------+----------+-------+
S
+----------+-------+
| item | price |
+----------+-------+
| pen | 25 |
| book | 20 |
| Eraser | 10 |
| sharpner | 3 |
+----------+-------+
Insert INTO M
SELECT
CASE WHEN(SELECT cal_day FROM time WHERE cal_dt = current_date) = 'Monday' THEN s.*
ELSE
(CASE WHEN(SELECT cal_dt FROM time WHERE wk_id =(SELECT wk_id FROM time WHERE cal_dt = current_date ) NOT IN (SELECT DISTINCT sent_day FROM M) THEN 1 ELSE 0 END)
THEN s.* ELSE END
FROM s
I would do this in two separate INSERT statements:
The first condition ("if today is monday") is quite easy:
insert into m (sent_day, item, price)
select current_date, item, price
from s
where exists (select *
from "time"
where cal_dt = current_date
and cal_day = 'Monday');
I find storing the date and the week day a bit confusing as the week day can easily be extracted from the day. For the test "if today is Monday" it's actually not necessary to consult the "time" table at all:
insert into m (sent_day, item, price)
select current_date, item, price
from s
where extract(dow from current_date) = 1;
The second part is a bit more complicated, but if I understand it correctly, it should be something like this:
insert into m (sent_day, item, price)
select current_date, item, price
from s
where not exists (select *
from m
where m.sent_day in (select cal_dt
from "time" t
where cal_dt = current_date
and cal_day <> 'Monday'));
If you just want a single INSERT statement, you could simply do a UNION ALL between the two selects:
insert into m (sent_day, item, price)
select current_date, item, price
from s
where extract(dow from current_date) = 1
union all
select current_date, item, price
from s
where not exists (select *
from m
where m.sent_day in (select cal_dt
from "time" t
where cal_dt = current_date
and cal_day <> 'Monday'));

Pad row with default if values not found PostgresSQL

I wanted to return the last 7 days of user_activity, but for those empty days I want to add 0 as value
Say I have this table
actions | id | date
------------------------
67 | 123 | 2019-07-7
90 | 123 | 2019-07-9
100 | 123 | 2019-07-10
50 | 123 | 2019-07-13
30 | 123 | 2019-07-15
and this should be the expected output , for the last 7 days
actions | id | date
------------------------
90 | 123 | 2019-07-9
100 | 123 | 2019-07-10
0 | 123 | 2019-07-11 <--- padded
0 | 123 | 2019-07-12 <--- padded
50 | 123 | 2019-07-13
0 | 123 | 2019-07-14 <--- padded
30 | 123 | 2019-07-15
Here is my query so far, I can only get the last 7 days
but not sure if it's positive to add to default values
SELECT *
FROM user_activity
WHERE action_day > CURRENT_DATE - INTERVAL '7 days'
ORDER BY uid, action_day
You may left join your table with generate_series. First you need to have a way to use the rows for distinct ids. That set can then be correctly joined with the main table.
WITH days
AS (SELECT id,dt
FROM (
SELECT DISTINCT id FROM user_activity
) AS ids
CROSS JOIN generate_series(
CURRENT_DATE - interval '7 days',
CURRENT_DATE, interval '1 day') AS dt
)
SELECT
coalesce(u.actions,0)
,d.id
,d.dt
FROM days d LEFT JOIN user_activity u ON u.id = d.id AND u.action_day = d.dt
DEMO

Count grouping by specific date intervals

I want to count the amount of "services" a customer have in 30 days period groups, since the contract start day. So I have to count the services in a monthly based period since his start date. Simplifying the table is something like this:
services
------------------
id serial
id_customer bigint
service_date date
Lets imaging there is only one type of service. I solve it like this:
SELECT
DATE_PART('year',service_date)||'-'|| CASE WHEN DATE_PART('day',service_date) >= 15 THEN
DATE_PART('month',service_date)
ELSE
CASE WHEN DATE_PART('month',service_date) = 1 THEN
12
ELSE
DATE_PART('month',service_date)-1
END
END bill, count(id)
FROM services
WHERE id_customer = 1
GROUP BY bill
results would be
bill | count
-------------------
2019-02 | 2455333
In the example the start date for id_customer 1 is 2019-02-15 but for 2019-02 period I will be counting the services until 2019-03-14.
What I want to know is, there is a better/more efficient solution?
I saw the solution here but implies an INNER JOIN with a GROUP BY with the same table which I think it would be slower, because my table has a lot of records.
You don't need to worry about the actual number of days in a month nor the month, year or day-of-month.
Just use the start date for a customer and let PostgreSQL generate the correct billing cycle periods for you.
To run a single query over all customers, I have used a separate table with the customer id as well as a billing_start date configured, for which we can then run a query such as the following:
WITH
periods (id, period_start, period_end) AS (
SELECT
id,
generate_series(billing_start, current_date, '1 month'::interval)::date,
(generate_series(billing_start, current_date, '1 month'::interval) + '1 month'::interval)::date
FROM test_customers
),
data AS (
SELECT
periods.id AS customer,
period_start,
count(test_services.*) AS service_calls
FROM periods INNER JOIN test_services ON (test_services.id_customer = periods.id)
WHERE test_services.service_date >= periods.period_start AND test_services.service_date < periods.period_end
GROUP BY 1, 2
)
SELECT customer, to_char(period_start, 'YYYY-MM') AS bill, service_calls
FROM data
ORDER BY 1, 2
;
...resulting in an output such as the following:
customer | bill | service_calls
----------+---------+---------------
1 | 2018-12 | 382736
1 | 2019-01 | 382735
1 | 2019-02 | 345696
2 | 2018-12 | 382736
2 | 2019-01 | 382734
2 | 2019-02 | 234580
3 | 2018-12 | 382734
3 | 2019-01 | 382736
3 | 2019-02 | 123463
4 | 2018-12 | 382734
4 | 2019-01 | 382736
4 | 2019-02 | 12346
5 | 2019-01 | 382735
5 | 2019-02 | 283965
6 | 2019-01 | 382735
6 | 2019-02 | 172848
7 | 2019-01 | 382734
7 | 2019-02 | 61732
8 | 2019-02 | 333351
9 | 2019-02 | 222234
10 | 2019-02 | 111117
(21 rows)
Complete online example: https://rextester.com/IHLJ95398
An important thing to note for this to be fast is a multi-column index on id_customer and service_date because that's where the counting takes place, which then can be done without sorting:
CREATE INDEX idx_svc_customer_date ON test_services (id_customer, service_date);
(otherwise, the sorting will most likely be done on disk, rather than in memory for large data sets)
If you just want the cycles for a single customer, use it like this:
WITH
periods (id, period_start, period_end) AS (
SELECT
id,
generate_series(billing_start, current_date, '1 month'::interval)::date,
(generate_series(billing_start, current_date, '1 month'::interval) + '1 month'::interval)::date
FROM test_customers WHERE id = 4
),
data AS (
SELECT
periods.id AS customer,
period_start,
count(test_services.*) AS service_calls
FROM periods INNER JOIN test_services ON (test_services.id_customer = periods.id)
WHERE test_services.service_date >= periods.period_start AND test_services.service_date < periods.period_end
GROUP BY 1, 2
)
SELECT customer, to_char(period_start, 'YYYY-MM') AS bill, service_calls
FROM data
ORDER BY 1, 2
;
...giving:
bill | service_calls
---------+---------------
2018-12 | 382734
2019-01 | 382736
2019-02 | 12346
(3 rows)

Showing date even zero value SQL

I have SQL Query:
SELECT Date, Hours, Counts FROM TRANSACTION_DATE
Example Output:
Date | Hours | Counts
----------------------------------
01-Feb-2018 | 20 | 5
03-Feb-2018 | 25 | 3
04-Feb-2018 | 22 | 3
05-Feb-2018 | 21 | 2
07-Feb-2018 | 28 | 1
10-Feb-2018 | 23 | 1
If you can see, there are days that missing because no data/empty, but I want the missing days to be shown and have a value of zero:
Date | Hours | Counts
----------------------------------
01-Feb-2018 | 20 | 5
02-Feb-2018 | 0 | 0
03-Feb-2018 | 25 | 3
04-Feb-2018 | 22 | 3
05-Feb-2018 | 21 | 2
06-Feb-2018 | 0 | 0
07-Feb-2018 | 28 | 1
08-Feb-2018 | 0 | 0
09-Feb-2018 | 0 | 0
10-Feb-2018 | 23 | 1
Thank you in advanced.
You need to generate a sequence of dates. If there are not too many, a recursive CTE is an easy method:
with dates as (
select min(date) as dte, max(date) as last_date
from transaction_date td
union all
select dateadd(day, 1, dte), last_date
from dates
where dte < last_date
)
select d.date, coalesce(td.hours, 0) as hours, coalesce(td.count, 0) as count
from dates d left join
transaction_date td
on d.dte = td.date;

How to insert additional values in between a GROUP BY

i am currently making a monthly report using MySQL. I have a table named "monthly" that looks something like this:
id | date | amount
10 | 2009-12-01 22:10:08 | 7
9 | 2009-11-01 22:10:08 | 78
8 | 2009-10-01 23:10:08 | 5
7 | 2009-07-01 21:10:08 | 54
6 | 2009-03-01 04:10:08 | 3
5 | 2009-02-01 09:10:08 | 456
4 | 2009-02-01 14:10:08 | 4
3 | 2009-01-01 20:10:08 | 20
2 | 2009-01-01 13:10:15 | 10
1 | 2008-12-01 10:10:10 | 5
Then, when i make a monthly report (which is based by per month of per year), i get something like this.
yearmonth | total
2008-12 | 5
2009-01 | 30
2009-02 | 460
2009-03 | 3
2009-07 | 54
2009-10 | 5
2009-11 | 78
2009-12 | 7
I used this query to achieved the result:
SELECT substring( date, 1, 7 ) AS yearmonth, sum( amount ) AS total
FROM monthly
GROUP BY substring( date, 1, 7 )
But I need something like this:
yearmonth | total
2008-01 | 0
2008-02 | 0
2008-03 | 0
2008-04 | 0
2008-05 | 0
2008-06 | 0
2008-07 | 0
2008-08 | 0
2008-09 | 0
2008-10 | 0
2008-11 | 0
2008-12 | 5
2009-01 | 30
2009-02 | 460
2009-03 | 3
2009-05 | 0
2009-06 | 0
2009-07 | 54
2009-08 | 0
2009-09 | 0
2009-10 | 5
2009-11 | 78
2009-12 | 7
Something that would display the zeroes for the month that doesnt have any value. Is it even possible to do that in a MySQL query?
You should generate a dummy rowsource and LEFT JOIN with it:
SELECT *
FROM (
SELECT 1 AS month
UNION ALL
SELECT 2
…
UNION ALL
SELECT 12
) months
CROSS JOIN
(
SELECT 2008 AS year
UNION ALL
SELECT 2009 AS year
) years
LEFT JOIN
mydata m
ON m.date >= CONCAT_WS('.', year, month, 1)
AND m.date < CONCAT_WS('.', year, month, 1) + INTERVAL 1 MONTH
GROUP BY
year, month
You can create these as tables on disk rather than generate them each time.
MySQL is the only system of the major four that does have allow an easy way to generate arbitrary resultsets.
Oracle, SQL Server and PostgreSQL do have those (CONNECT BY, recursive CTE's and generate_series, respectively)
Quassnoi is right, and I'll add a comment about how to recognize when you need something like this:
You want '2008-01' in your result, yet nothing in the source table has a date in January, 2008. Result sets have to come from the tables you query, so the obvious conclusion is that you need an additional table - one that contains each month you want as part of your result.