How to calculate ages in BigQuery? - google-bigquery

I have two TIMESTAMP columns in my table: customer_birthday and purchase_date. I want to create a query to show the number of purchases by customer age, to create a chart.
But how do I calculate ages, in years, using BigQuery? In other words, how do I get the difference in years between two TIMESTAMPs? The age calculation cannot be made using days or hours, because of leap years, so the function DATEDIFF(<timestamp1>,<timestamp2>) is not appropriate.
Thanks.

First of all, I'd really love BigQuery to have a function which calculates current age based on a date. That seems to be like a very common use case and it's not really easy due to the whole leap year thing.
I found a great article about this issue: https://towardsdatascience.com/how-to-accurately-calculate-age-in-bigquery-999a8417e973
Their final approach is similar to Lars Haugseth's and Saad's answer, but they do not use the DAYOFYEAR part in order to avoid issues with leap years. It also gives you the flexibility not only to calculate the current age, but also the age at a particular date that you pass to the function as argument:
CREATE OR REPLACE FUNCTION workspace.age_calculation(as_of_date DATE, date_of_birth DATE)
AS (
DATE_DIFF(as_of_date,date_of_birth, YEAR) -
IF(EXTRACT(MONTH FROM date_of_birth)*100 + EXTRACT(DAY FROM date_of_birth) >
EXTRACT(MONTH FROM as_of_date)*100 + EXTRACT(DAY FROM as_of_date)
,1,0)
)

Regarding the difference between dates - you could consider user-defined functions (https://cloud.google.com/bigquery/user-defined-functions) with a JavaScript date library, such as Datejs or Moment.js

You can use DATE_DIFF to get the difference in years, but need to subtract by one if the birthday has not yet occured this year:
IF(EXTRACT(DAYOFYEAR FROM CURRENT_DATE) < EXTRACT(DAYOFYEAR FROM birthdate),
DATE_DIFF(CURRENT_DATE, birthdate, YEAR) - 1,
DATE_DIFF(CURRENT_DATE, birthdate, YEAR)) AS age

Here it is in a user defined function:
CREATE TEMP FUNCTION calculateAge(birthdate DATE) AS (
DATE_DIFF(CURRENT_DATE, birthdate, YEAR) +
IF(EXTRACT(DAYOFYEAR FROM CURRENT_DATE) < EXTRACT(DAYOFYEAR FROM birthdate), -1, 0) -- subtract 1 if bithdate has not yet occured this year
);

You can compute the number of days it would be if all years were 365 days long, take the difference, and divide by 365. For example:
SELECT (day2-day1)/365
FROM (
SELECT YEAR(t1) * 365 + DAYOFYEAR(t1) as day1,
YEAR(t2) * 365 + DAYOFYEAR(t2) as day2
FROM (
SELECT TIMESTAMP('20000201') as t1,
TIMESTAMP('20140201') as t2))
This returns 14.0, even though there are intervening leap years. If you want the final result as an integer instead of floating point, you can use the INTEGER() function to cast the result.
Note that if one of the dates is a leap day (feb 29) it will appear to be one year away from march 1, but I think this sounds like the intended behavior.

Another way to calculate age that takes leap years into account is to:
Calculate simple age based on difference in year
Either subtract 1 or not by:
Add difference in years to birthday (e.g. if today is 2022-12-14 and birthday is 2000-12-30, then the "new" birthday becomes 2022-12-30)
Do a DAY-based difference between today and "new" birthday, which either gives you a positive number (birthday passed for this year) or negative number (still has birthday this year)
Subtract 1 year from simple age calculation if number is negative
In BigQuery SQL code this looks like:
SELECT
bd AS birthday
,today
,DATE_DIFF(today, bd, YEAR) AS simpleAge
,DATE_DIFF(today, bd, YEAR) +
(CASE
WHEN DATE_DIFF(today, DATE_ADD(bd, INTERVAL DATE_DIFF(today, bd, YEAR) YEAR), DAY) >= 0
THEN 0
ELSE -1
END) AS age
FROM
(SELECT
PARSE_DATE("%Y-%m-%d", "2000-12-01") AS bd
,CURRENT_DATE("Asia/Tokyo") AS today
)
Outputs:
birthday
today
simpleAge
age
2000-12-30
2022-12-14
22
21

Related

Convert integer years or months into days in SQL impala

I have two columns; both have integer values. One Representing years, and the other representing months.
My goal is to perform calculations in days (integer), so I have to convert both to calendar days, to achieve that, taking in consideration that we have years with both 365 and 366 days.
Example in pseudo code:
Select Convert(years_int) to days, Convert(months int) to days
from table.
Real Example:
if --> Years = 1 and Months = 12
1) Convert both to days to compare them: Years = 365 days; Months = 365 days
After conversion : (Years = Months) Returns TRUE.
The problem is when we have years = 10 (for example), we must take in account the fact that at least two of them have 366 days. The same with Months - we have 30 and 31 days. So I need to compensate that fact to get the most accurate possible value in days.
Thanks in advance
From integers to timestamp can be done in PostgreSQL. I do not have impala, but hopefully below script will help you getting this done using impala:
with
year as (select 2022 as y union select 2023),
month as (select generate_series(1,12) as m),
day as(select generate_series(1,31) as d )
select y,m,d,dt from (
select
y,m,d,
to_date(ds,'YYYYMMDD')+(((d-1)::char(2))||' day')::interval dt
from ( select
*,
y::char(4)|| right('0'||m::char(2),2) || right('0'||0::char(2),2) as ds
from year,month,day
) x
) y
where extract(year from dt)=y and extract(month from dt)=m
order by dt
;
see: DBFIDDLE
Used functions in this query and, a way, to convert them to imapala (remember I do not use that tool/language/dialect)
function
impala alternative
to_date(a,b)
This will convert the string a to a date using the format b. Using impala you can use CAST(expression AS type FORMAT pattern)
y::char(4)
Cast y to a char(4), Using imala you can use: CAST(expression AS type)
right(a,b)
Use: right()
\\
Use: concat()
generate_series(a,b)
This generates a serie of numbers from a to (an inclusing) b. A SQL altervative is to write SELECT 1 as x union SELECT 2 union SELECT 3, which generates the same series as generate_series(1,3) in PostgreSQL
extract(year from a)
Get the year from the datetime field a, see YEAR()
One special case is this one to_date(ds,'YYYYMMDD')+(((d-1)::char(2))||' day')::interval
This will convert ds (with datatype CHAR(8)) to a date, and then add (using +) a number of days (like: '4 day')
Because I included all days until 31, this will fail in Februari, April, June, September, November because those months do not have 31 days. This is corrected by the WHERE clause in the end (where extract(year from dt)=y and extract(month from dt)=m)

Rounding dates in SQL

I'd like to figure out the age of a person based on two dates: their birthday and the date they were created in a database.
The age is being calculated in days instead of years, though. Here's my query:
SELECT date_of_birth as birthday, created_at, (created_at - date_of_birth) as Age
FROM public.users
WHERE date_of_birth IS NOT NULL
The date_of_birth field is a date w/o a timestamp, but the created_at field is a date with a timestamp (e.g. 2017-05-06 01:27:40).
And my output looks like this:
0 years 0 mons 9645 days 1 hours 27 mins 40.86485 secs
Any idea how can I round/calculate the ages by the nearest year?
Using PostgreSQL.
If you are using MS SQLServer than you could
CONVERT(DATE, created_at)
and than calculate difference in months like
DATEDIFF(month, created_at, GETDATE())/12
means you can use reminder in months to add or substract one year.
In PostgreSQL, dates are handled very differently to MSSQL & MySQL. In fact it follows the SQL standard very well, even if it’s not always intuitive.
To actually calculate the age of something, you can use age():
SELECT age(date1,date1)
Like all of PostgreSQL’s functions, there are variations of data type, and you may need to do something like this:
SELECT age(date1::date,date1::date)
or, more formally:
SELECT age(cast(date1 as date),cast(date1 as date))
The result will be an interval, which displays as a string :
SELECT age(current_date::date,'1981-01-17'::date);
-- 36 years 3 mons 22 days
If you just want the age in years, you can use extract:
SELECT extract('year' from age(current_date::date,'1981-01-17'::date));
Finally, if you want it correct to the nearest year, you can apply the old trick of adding half an interval:
extract('year' from age(current_date::date,'1981-01-17'::date)+interval '.5 year');
It’s not as simple as some of the other DBMS products, but it’s much more flexible, if you can get your head around it.
Here are some references:
https://www.postgresql.org/docs/current/static/functions-datetime.html
http://www.sqlines.com/postgresql/how-to/datediff

Calculate year from date difference in Oracle

I want to calculate the number of years between two dates.
eg :- Select to_date('30-OCT-2013') - TO_date('30-SEP-2014') FROM DUAL;
This would result to 335 days. I want to show this in years, which will be .97 years.
Simply do this(divide by 365.242199):
Select (to_date('30-SEPT-2014') - TO_date('30-OCT-2013'))/365.242199 FROM DUAL;
1 YEAR = 365.242199 days
OR
Try something like this using MONTHS_BETWEEN:-
select floor(months_between(date '2014-10-10', date '2013-10-10') /12) from dual;
or you may also try this:-
SELECT EXTRACT(YEAR FROM date1) - EXTRACT(YEAR FROM date2) FROM DUAL;
On a side note:-
335/365.242199 = 0.917199603 and not .97
I don't know how you figure that's .97 years. Here's what I get:
SQL> SELECT ( TO_date('30-SEP-2014') - to_date('30-OCT-2013')) /
(ADD_MONTHS(DATE '2013-10-30',12) - DATE '2013-10-30') "Year Fraction"
FROM DUAL;
Year Fraction
-------------
0.91780821917
You're going to have to pick a date to base your year calculation on. This is one way to do it. I chose to make a year be the number of days between 10/30/2013 and 10/30/2014. You could also make it a year between 9/30/2013 and 9/30/2014.
As an aside, if you're only interested in 2 decimal places, 365 is pretty much as good as 366.
UPDATE: Used ADD_MONTHS in calculating the denominator. That way you can use the same date for the entire calculation of the number of days in a year.
None of the methods proposed in the other answers give exactly the same answer, look:
with dates as ( select to_date('2013-10-01', 'YYYY-MM-DD') as date1, to_date('2014-09-01', 'YYYY-MM-DD') as date2 from dual)
select months_between(date2, date1)/12 as years_between, 'months_between(date1, date2)' as method from dates
union
select (date2 - date1)/365.242199, '(date2 - date1) / 365.242199' from dates
union
select extract(year from date2) - extract(year from date1), 'extract(year) from date2 - extract(year from date1)' from dates
union
select (date2 - date1) / (ADD_MONTHS(date1 ,12) - date1), '(nb days between date1 and date2) / (nb days in 1 year starting at date1)' from dates
;
gives
YEARS_BETWEEN METHOD
0.9166666666666666666666666666666666666667 months_between(date1, date2)
0.9171996032145234127231831719422979380321 (date2 - date1) / 365.242199
0.9178082191780821917808219178082191780822 nb days date2-date1 / (nb days in 1 year starting at date1)
1 extract(year) from date2 - extract(year from date1)
Why? Because they are all answering slightly different questions.
MONTHS_BETWEEN gives the number of whole months between the 2 dates, and calculates the fractional part as the remainder in days divided by 31.
dividing by 365.242199 assumes that you want the number of solar years between 00:00 on the first date and 00:00 on the second date, to 9 significant figures.
the third method assumes you want to calculate how many calendar days between the two dates, relative to the number of calendar days in the specific year that started on the first date (so the same number of calendar days will give you a different number of years, depending on whether there's a leap day between date1 and the same date on the following year).
the extract(year) approach assumes you want know the difference in whole numbers between the calendar year of the first date and the calendar year of the second date
It's not possible to answer the question perfectly, without knowing which kind of year we are talking about. Do we mean a solar year, or a calendar year, and if we mean a calendar year, do we we want to calculate by months (as if all months were the same length, which they aren't) or by the actual number of days between those dates and in that specific year?
Indeed, if we're talking about calendar years, it's not possible to calculate a fractional number of years in a consistent way at all, since the concept "calendar year" doesn't correspond to a fixed number of days.
The good news is that (aside from the fourth method) all the approaches give the same answer to the first 2 significant figures, as DCookie said. So you can save worrying about what you mean when you say "year", and instead start to think of other concerns such as performance, portability, readability... which also are quite different between these approaches.
I do think though, that whenever a non-programmer asks for something like "the fractional number of years between two dates," they should be punished by being given a detailed explanation of the different ways to calculate it, and why and how they are different, until they agree that it would be better expressed in number of weeks (which at least have the benefit of containing a fixed number of days).

AS400 SQL query to determine records for who is 70.5 years old for the current year

I am trying to find individuals that will turn 70.5 years old in the current year.
dob7 = DECIMAL(7) YYYYDDD
select acctno, name, address, status, year(curdate()) - year(date(digits(dob7))) as Age
from mydata.cdmast cdmast
left join mydata.cfmast cfmast
on cdmast.cifno = cfmast.cifno
where status <> 'R' and year(curdate()) - year(date(digits(dob7))) >= 70
The code above returns the following error:
[Error Code: -181, SQL State: 22008] [IBM][System i Access ODBC Driver][DB2 for i5/OS]SQL0181 - Value in date, time, or timestamp string not valid.
After seeing the other answers, I'm submitting my own. This should have the benefit of using any indicies on dob7, and should work without too many 'tricks'.
I've modified the WHERE clause in your original query. I'm assuming '.5 years' means '6 months', although this is adjustable. I deliberately wrapped the calculations in CTEs to 'encapsulate' the logic; the operations should be nearly no-cost.
WITH Youngest (dateOfBirth) as (
SELECT CURRENT_DATE - 70 YEARS - 6 MONTHS
FROM sysibm/sysdummy1),
Converted (dateOfBirth, formatted) as (
SELECT dateOfBirth, YEAR(dateOfBirth) * 1000 + DAYOFYEAR(dateOfBirth)
FROM Youngest)
SELECT acctno, name, address, status,
YEAR(CURRENT_DATE) - INT(dob7 / 1000)
- CASE WHEN DAYOFYEAR(CURRENT_DATE) < MOD(dbo7, 1000)
THEN 1
ELSE 0 END as Age
FROM myData.cdMast cdMast
JOIN Converted
ON Converted.formatted >= dob7
LEFT JOIN myData.cfMast cfMast
ON cdMast.cifno = cfMast.cifno
WHERE status <> 'R'
Please note that it will consider people born on a leap day to have had their birthday on March 1st (due to DAYOFYEAR()).
From the DATE scalar function documentation:
A string with an actual length of 7 that represents a valid date in the form yyyynnn, where yyyy are digits denoting a year, and nnn are digits between 001 and 366 denoting a day of that year.
Reformat the date with:
DATE(SUBSTR(DIGITS(DOB7),4,4) || SUBSTR(DIGITS(DOB7),1,3))
To select 70.5 or older by the end of the current year:
YEAR(CURRENT_DATE) - YEAR(DATE(SUBSTR(DIGITS(DOB7),4,4) || SUBSTR(DIGITS(DOB7),1,3))) = 70
AND MONTH(DATE(SUBSTR(DIGITS(DOB7),4,4) || SUBSTR(DIGITS(DOB7),1,3))) >= 6
OR YEAR(CURRENT_DATE) - YEAR(DATE(SUBSTR(DIGITS(DOB7),4,4) || SUBSTR(DIGITS(DOB7),1,3))) > 70
The error message is saying that the contents of DOB7 cannot be converted to a date. Does the value of DOB7 match one of the valid formats? Note that many require quotation marks. http://publib.boulder.ibm.com/infocenter/iseries/v6r1m0/index.jsp?topic=/db2/rbafzscadate.htm
Try this instead:
(year(curdate()) - mod(dob7, 10000)) >= 70
This is using modular arithmetic to extract the year, rather than trying to convert it to a date.
By the way, storing the date this way seems very awkward. Databases have built-in support for dates and times, so it is usually better to store them in the native format.
If you date of birth is really yyyymmm, then the following should work for years:
(year(curdate()) - cast(dob7/1000 as int)) >= 70
For the half year:
(year(curdate()) - cast(dob7/1000 as int))+(1-mod(dob7,1000)/365.0) >= 70.5

SQL that list all birthdays within the next and previous 14 days

I have a MySQL member table, with a DOB field which stores all members' dates of birth in DATE format (Notice: it has the "Year" part)
I'm trying to find the correct SQL to:
List all birthdays within the next 14 days
and another query to:
List all birthdays within the previous 14 days
Directly comparing the current date by:
(DATEDIFF(DOB, now()) <= 14 and DATEDIFF(DOB, now()) >= 0)
will fetch nothing since the current year and the DOB year is different.
However, transforming the DOB to 'this year' won't work at all, because today could be Jan 1 and the candidate could have a DOB of Dec 31 (or vice versa)
It will be great if you can give a hand to help, many thanks! :)
#Eli had a good response, but hardcoding 351 makes it a little confusing and gets off by 1 during leap years.
This checks if birthday (dob) is within next 14 days. First check is if in same year. Second check is if its say Dec 27, you'll want to include Jan dates too.
With DAYOFYEAR( CONCAT(YEAR(NOW()),'-12-31') ), we are deciding whether to use 365 or 366 based on the current year (for leap year).
SELECT dob
FROM birthdays
WHERE DAYOFYEAR(dob) - DAYOFYEAR(NOW()) BETWEEN 0 AND 14
OR
DAYOFYEAR( CONCAT(YEAR(NOW()),'-12-31') ) - ( DAYOFYEAR(NOW()) - DAYOFYEAR(dob) ) BETWEEN 0 AND 14
Here's the simplest code to get the upcoming birthdays for the next x days and previous x days
this query is also not affected by leap-years
SELECT name, date_of_birty
FROM users
WHERE DATE(CONCAT(YEAR(CURDATE()), RIGHT(date_of_birty, 6)))
BETWEEN
DATE_SUB(CURDATE(), INTERVAL 14 DAY)
AND
DATE_ADD(CURDATE(), INTERVAL 14 DAY)
My first thought was it would be easy to just to use DAYOFYEAR and take the difference, but that actually gets kinda trick near the start/end of a yeay. However:
WHERE
DAYOFYEAR(NOW()) - DAYOFYEAR(dob) BETWEEN 0 AND 14
OR DAYOFYEAR(dob) - DAYOFYEAR(NOW()) > 351
Should work, depending on how much you care about leap years. A "better" answer would probably be to extract the DAY() and MONTH() from the dob and use MAKEDATE() to build a date in the current (or potential past/following) year and compare to that.
Easy,
We can obtain the nearer birthday (ie the birthday of this year) by this code:
dateadd(year,datediff(year,dob,getdate()),DOB)
use this in your compares ! it will work.
There are a number of options, I would first try to transform by number of years between current year and row's year (i.e. Add their age).
Another option is day number within the year (but then you have still to worry about the rollover arithmetic or modulo).
This is my query for the 30 days before check:
select id from users where
((TO_DAYS(concat(DATE_FORMAT(NOW(),'%Y'), '-', DATE_FORMAT(date_of_birth, '%m-%d')))-TO_DAYS(NOW()))>=-30
AND (TO_DAYS(concat(DATE_FORMAT(NOW(),'%Y'), '-', DATE_FORMAT(date_of_birth, '%m-%d')))-TO_DAYS(NOW()))<=0)
OR (TO_DAYS(concat(DATE_FORMAT(NOW(),'%Y'), '-', DATE_FORMAT(date_of_birth, '%m-%d')))-TO_DAYS(NOW()))>=(365-31)
and 30 days after:
select id from users where
((TO_DAYS(NOW())-TO_DAYS(concat(DATE_FORMAT(NOW(),'%Y'), '-', DATE_FORMAT(date_of_birth, '%m-%d'))))>=-31
AND (TO_DAYS(NOW())-TO_DAYS(concat(DATE_FORMAT(NOW(),'%Y'), '-', DATE_FORMAT(date_of_birth, '%m-%d'))))<=0)
OR (TO_DAYS(NOW())-TO_DAYS(concat(DATE_FORMAT(NOW(),'%Y'), '-', DATE_FORMAT(date_of_birth, '%m-%d'))))>=(365-30)
My solution is as follow:
select cm.id from users cm where
date(concat(
year(curdate()) - (year(subdate(curdate(), 14)) < year(curdate())
and month(curdate()) < month(cm.birthday)) + (year(adddate(curdate(), 14)) > year(curdate())
and month(curdate()) > month(cm.birthday)), date_format(cm.birthday, '-%m-%d'))) between subdate(curdate(), 14)
and adddate(curdate(), 14);
It looks like it works fine when the period captures the current and next year or the current and previous year