I'd like to figure out the age of a person based on two dates: their birthday and the date they were created in a database.
The age is being calculated in days instead of years, though. Here's my query:
SELECT date_of_birth as birthday, created_at, (created_at - date_of_birth) as Age
FROM public.users
WHERE date_of_birth IS NOT NULL
The date_of_birth field is a date w/o a timestamp, but the created_at field is a date with a timestamp (e.g. 2017-05-06 01:27:40).
And my output looks like this:
0 years 0 mons 9645 days 1 hours 27 mins 40.86485 secs
Any idea how can I round/calculate the ages by the nearest year?
Using PostgreSQL.
If you are using MS SQLServer than you could
CONVERT(DATE, created_at)
and than calculate difference in months like
DATEDIFF(month, created_at, GETDATE())/12
means you can use reminder in months to add or substract one year.
In PostgreSQL, dates are handled very differently to MSSQL & MySQL. In fact it follows the SQL standard very well, even if it’s not always intuitive.
To actually calculate the age of something, you can use age():
SELECT age(date1,date1)
Like all of PostgreSQL’s functions, there are variations of data type, and you may need to do something like this:
SELECT age(date1::date,date1::date)
or, more formally:
SELECT age(cast(date1 as date),cast(date1 as date))
The result will be an interval, which displays as a string :
SELECT age(current_date::date,'1981-01-17'::date);
-- 36 years 3 mons 22 days
If you just want the age in years, you can use extract:
SELECT extract('year' from age(current_date::date,'1981-01-17'::date));
Finally, if you want it correct to the nearest year, you can apply the old trick of adding half an interval:
extract('year' from age(current_date::date,'1981-01-17'::date)+interval '.5 year');
It’s not as simple as some of the other DBMS products, but it’s much more flexible, if you can get your head around it.
Here are some references:
https://www.postgresql.org/docs/current/static/functions-datetime.html
http://www.sqlines.com/postgresql/how-to/datediff
Related
I apologize, I am new at SQL. I am using BigQuery. I have a field called "last_engaged_date", this field is a datetime value (2021-12-12 00:00:00 UTC). I am trying to perform a count on the number of records that were "engaged" 12 months ago, 18 months ago, and 24 months ago based on this field. At first, to make it simple for myself, I was just trying to get a count of the number of records per year, something like:
Select count(id), year(last_engaged_date) as last_engaged_year
from xyz
group by last_engaged_year
order by last_engaged_year asc
I know that there are a lot of things wrong with this query but primarily, BQ says that "Year" is not a valid function? Either way, What I really need is something like:
Date() - last_engaged_date = int(# of months)
count if <= 12 months as "12_months_count" (# of records where now - last engaged date is less than or equal to 12 months)
count if <= 18 months as "18_months_count"
count if <= 24 months as "24_months_count"
So that I have a count of how many records for each last_engaged_date period there are.
I hope this makes sense. Thank you so much for any ideas
[How to] Return the number of months between now and datetime value [in BigQuery] SQL
The simples way is just to use DATE_DIFF function as in below example
date_diff(current_date(), date(last_engaged_date), month)
I have a daily weather data SQL table with columns including date, type (temperature, rain, wind etc measurement) and value. The dataset spans 20 years of data.
How can I calculate daily averages for each day and measurement type, averaging values for the given date from data for all the 20 years in question? So e.g. I want to see the average temperature for 1 Jan (average of temperatures for 1 Jan 2020, 1 Jan 2019, etc)
Given there's a total of 750 million rows of data, should I create a materialised view of the calculations or what's the best way to cache the answers?
You need to extract the month and day from the date. The standard SQL function uses extract():
select extract(month from date) as month, extract(day from date) as day,
avg(temperature), avg(rain), . . .
from t
group by extract(month from date), extract(day from date);
Not all databases support these standard functions so you may need to use the functions specific to your (unspecified) database.
it would depend on which sql server you use, but in general, you should extract the day and the month from the date (on Microsoft SQL Server it is the DATEPART function) and then group by that and calculate the averages.
SELECT DATEPART(month, date_col) AS Month,
DATEPART(day, date_col) AS Day,
AVG(temp) AS Temp,
AVG(rain) AS Rain,
...
FROM table
GROUP BY DATEPART(month, date_col), DATEPART(day, date_col)
There is an extension to postgresql called timescaledb that makes it easier to query this type of data. Beware that it does make changes to the postgresql-database that requires changes to backup-routines. And if the current database is partitioned it will require a dump and restore.
A query can look like this:
-- By month
select
extract(year from created_at) as year,
extract(month from time_bucket('1 day', created_at)) as month,
min(temp) as temp,
from
readings
where
created_at > '2019-01-01' and created_at < '2020-01-01'
group by
year,
month
order by
year,
month;
750 Mio rows. You need an efficient index. Consider this function and the index based on it.
Assuming a table weather with a date column date:
CREATE FUNCTION f_mmdd(date) -- or timestamp input?
RETURNS int LANGUAGE sql IMMUTABLE PARALLEL SAFE AS
'SELECT (EXTRACT(month FROM $1) * 100 + EXTRACT(day FROM $1))::int';
CREATE INDEX weather_mmdd_idx ON weather(f_mmdd(date));
This index helps to quickly identify all rows for a particular day of the year.
The manual about EXTRACT.
The above expression proved fastest for various reasons. Just re-ran some performance tests in Postgres 13, and nothing changed.
Details in this closely related answer:
How do you do date math that ignores the year?
There is also EXTRACT(doy FROM date) to extract the day of the year (1–365/366), which is even faster. But, obviously, there is an off-by-one error for dates past Feb 29 in leap years in the Gregorian calendar.
Then the query for Jan 01 can be:
SELECT date_trunc('day', date) -- if it's a timestamp column
-- date -- if it's really a date column (which I find hard to believe)
, avg(temperature) AS avg_temperature
, avg(rain) AS avg_rain
-- , ...
FROM weather
WHERE f_mmdd(date) = f_mmdd('2000-01-01') -- or just 101 for Jan 01
GROUP BY 1;
The year in f_mmdd('2000-01-01') is arbitrary. Or just use the integer 101 for Jan 01.
You might be able to optimize further with multicolumn indexes for particular dimensions (temperature, rain, ...). But that depends on undisclosed details.
Sounds like the dataset isn't going to change. So a MATERIALIZED VIEW with readily computed aggregates per day might be a better alternative in the long run.
A word of warning: Computed averages are only correct if the measurements are spread out evenly across each day. Else, computed numbers are just averages of the given numbers, not actual average values for each day.
How to write a SQL/Oracle query to retrieve all those customers whose age in months is more than 200 months?
I have a exam on Monday but I am having some confusion with months and dates calculation.
You can use a Query like this for MySQL:
SELECT *
FROM yourTable
WHERE bithdayField <= NOW() - INTERVAL 200 MONTH;
The logic is the same (the date is older than today minus 200 months), but the actual SQL is usually different, because DBMSes have a large variation of syntax in the date/time area.
Standard SQL & MySQL:
WHERE datecol < current_date - interval '200' month
Oracle:
WHERE datecol < add_months(current_date, -200)
In fact Oracle also supports the Standard SQL version, but it's not recommended, because you might get an invalid date error when you do something like '2018-03-31' - interval '1' month. This is based on a (dumb) Standard SQL rule which MySQL doesn't follow: one month before March 31 was February 31, oops, that date doesn't exists.
In Oracle DB, there are two nice functions : months_between and add_months
been used for these type date calculations. For your case, you may use one of the following :
select id, name, surname
from customers
where months_between(trunc(sysdate),DOB)>200;
or
select id, name, surname
from customers
where add_months(trunc(sysdate),-200)>DOB;
demo
I have two TIMESTAMP columns in my table: customer_birthday and purchase_date. I want to create a query to show the number of purchases by customer age, to create a chart.
But how do I calculate ages, in years, using BigQuery? In other words, how do I get the difference in years between two TIMESTAMPs? The age calculation cannot be made using days or hours, because of leap years, so the function DATEDIFF(<timestamp1>,<timestamp2>) is not appropriate.
Thanks.
First of all, I'd really love BigQuery to have a function which calculates current age based on a date. That seems to be like a very common use case and it's not really easy due to the whole leap year thing.
I found a great article about this issue: https://towardsdatascience.com/how-to-accurately-calculate-age-in-bigquery-999a8417e973
Their final approach is similar to Lars Haugseth's and Saad's answer, but they do not use the DAYOFYEAR part in order to avoid issues with leap years. It also gives you the flexibility not only to calculate the current age, but also the age at a particular date that you pass to the function as argument:
CREATE OR REPLACE FUNCTION workspace.age_calculation(as_of_date DATE, date_of_birth DATE)
AS (
DATE_DIFF(as_of_date,date_of_birth, YEAR) -
IF(EXTRACT(MONTH FROM date_of_birth)*100 + EXTRACT(DAY FROM date_of_birth) >
EXTRACT(MONTH FROM as_of_date)*100 + EXTRACT(DAY FROM as_of_date)
,1,0)
)
Regarding the difference between dates - you could consider user-defined functions (https://cloud.google.com/bigquery/user-defined-functions) with a JavaScript date library, such as Datejs or Moment.js
You can use DATE_DIFF to get the difference in years, but need to subtract by one if the birthday has not yet occured this year:
IF(EXTRACT(DAYOFYEAR FROM CURRENT_DATE) < EXTRACT(DAYOFYEAR FROM birthdate),
DATE_DIFF(CURRENT_DATE, birthdate, YEAR) - 1,
DATE_DIFF(CURRENT_DATE, birthdate, YEAR)) AS age
Here it is in a user defined function:
CREATE TEMP FUNCTION calculateAge(birthdate DATE) AS (
DATE_DIFF(CURRENT_DATE, birthdate, YEAR) +
IF(EXTRACT(DAYOFYEAR FROM CURRENT_DATE) < EXTRACT(DAYOFYEAR FROM birthdate), -1, 0) -- subtract 1 if bithdate has not yet occured this year
);
You can compute the number of days it would be if all years were 365 days long, take the difference, and divide by 365. For example:
SELECT (day2-day1)/365
FROM (
SELECT YEAR(t1) * 365 + DAYOFYEAR(t1) as day1,
YEAR(t2) * 365 + DAYOFYEAR(t2) as day2
FROM (
SELECT TIMESTAMP('20000201') as t1,
TIMESTAMP('20140201') as t2))
This returns 14.0, even though there are intervening leap years. If you want the final result as an integer instead of floating point, you can use the INTEGER() function to cast the result.
Note that if one of the dates is a leap day (feb 29) it will appear to be one year away from march 1, but I think this sounds like the intended behavior.
Another way to calculate age that takes leap years into account is to:
Calculate simple age based on difference in year
Either subtract 1 or not by:
Add difference in years to birthday (e.g. if today is 2022-12-14 and birthday is 2000-12-30, then the "new" birthday becomes 2022-12-30)
Do a DAY-based difference between today and "new" birthday, which either gives you a positive number (birthday passed for this year) or negative number (still has birthday this year)
Subtract 1 year from simple age calculation if number is negative
In BigQuery SQL code this looks like:
SELECT
bd AS birthday
,today
,DATE_DIFF(today, bd, YEAR) AS simpleAge
,DATE_DIFF(today, bd, YEAR) +
(CASE
WHEN DATE_DIFF(today, DATE_ADD(bd, INTERVAL DATE_DIFF(today, bd, YEAR) YEAR), DAY) >= 0
THEN 0
ELSE -1
END) AS age
FROM
(SELECT
PARSE_DATE("%Y-%m-%d", "2000-12-01") AS bd
,CURRENT_DATE("Asia/Tokyo") AS today
)
Outputs:
birthday
today
simpleAge
age
2000-12-30
2022-12-14
22
21
I am wanting to do some queries on a sales table and a purchases table, things like showing the total costs, or total sales of a particular item etc.
Both of these tables also have a date field, which stores dates in sortable format(just the default I suppose?)
I am wondering how I would do things with this date field as part of my query to use date ranges such as:
The last year, from any given date of the year
The last 30 days, from any given day
To show set months, such as January, Febuary etc.
Are these types of queries possible just using a DATE field, or would it be easier to store months and years as separate tex fields?
If a given DATE field MY_DATE, you can perform those 3 operation using various date functions:
1. Select last years records
SELECT * FROM MY_TABLE
WHERE YEAR(my_date) = YEAR(CURDATE()) - 1
2. Last 30 Days
SELECT * FROM MY_TABLE
WHERE DATE_SUB(CURDATE(), INTERVAL 30 DAY) < MY_DATE
3. Show the month name
SELECT MONTHNAME(MY_DATE), * FROM MY_TABLE
I have always found it advantageous to store dates as Unix timestamps. They're extremely easy to sort by and query by range, and MySQL has built-in features that help (like UNIX_TIMESTAMP() and FROM_UNIXTIME()).
You can store them in INT(11) columns; when you program with them you learn quickly that a day is 86400 seconds, and you can get more complex ranges by multiplying that by a number of days (e.g. a month is close enough to 86400 * 30, and programming languages usually have excellent facilities for converting to and from them built into standard libraries).