SQL queries with date types - sql

I am wanting to do some queries on a sales table and a purchases table, things like showing the total costs, or total sales of a particular item etc.
Both of these tables also have a date field, which stores dates in sortable format(just the default I suppose?)
I am wondering how I would do things with this date field as part of my query to use date ranges such as:
The last year, from any given date of the year
The last 30 days, from any given day
To show set months, such as January, Febuary etc.
Are these types of queries possible just using a DATE field, or would it be easier to store months and years as separate tex fields?

If a given DATE field MY_DATE, you can perform those 3 operation using various date functions:
1. Select last years records
SELECT * FROM MY_TABLE
WHERE YEAR(my_date) = YEAR(CURDATE()) - 1
2. Last 30 Days
SELECT * FROM MY_TABLE
WHERE DATE_SUB(CURDATE(), INTERVAL 30 DAY) < MY_DATE
3. Show the month name
SELECT MONTHNAME(MY_DATE), * FROM MY_TABLE

I have always found it advantageous to store dates as Unix timestamps. They're extremely easy to sort by and query by range, and MySQL has built-in features that help (like UNIX_TIMESTAMP() and FROM_UNIXTIME()).
You can store them in INT(11) columns; when you program with them you learn quickly that a day is 86400 seconds, and you can get more complex ranges by multiplying that by a number of days (e.g. a month is close enough to 86400 * 30, and programming languages usually have excellent facilities for converting to and from them built into standard libraries).

Related

Calculate the average time between two dates

I need to find the result of a calculation that is nothing more than the average time in days from creation to completion of a task.
In this case, using a Redshift database (looker).
I have two dates (2022/10/01 to 2022/10/21) and I need to find the average day of execution of the creation of an object from start to finish.
Previously, I was able to calculate the totals of objects created per day, but I can't bring up the average:
SELECT created::date, count(n1pk_package_id)
FROM dbt_dw.base_package
WHERE fk_company_id = 245821 and created >= '2022-10-01' and created < '2022-10-22'
GROUP BY created::date
ORDER BY created DESC
I'm not able to do the opposite way of the count to bring the average of the range of days.
Assumption:
There is a created column in your table
You want to know the 'average' of the created column
You could extract the number of days that each date is different from a base date, and then use that to determine the 'average date'. It would be something like this:
select
date '2022-10-01' + interval '1 day' * int(avg(created - date '2022-10-01'))
from table
It subtracts a date (any date will do) from created, finds the average of that value against all desired rows, converts it to days and adds it back to that same date.

How do I calculate daily averages for multi-year data in SQL?

I have a daily weather data SQL table with columns including date, type (temperature, rain, wind etc measurement) and value. The dataset spans 20 years of data.
How can I calculate daily averages for each day and measurement type, averaging values for the given date from data for all the 20 years in question? So e.g. I want to see the average temperature for 1 Jan (average of temperatures for 1 Jan 2020, 1 Jan 2019, etc)
Given there's a total of 750 million rows of data, should I create a materialised view of the calculations or what's the best way to cache the answers?
You need to extract the month and day from the date. The standard SQL function uses extract():
select extract(month from date) as month, extract(day from date) as day,
avg(temperature), avg(rain), . . .
from t
group by extract(month from date), extract(day from date);
Not all databases support these standard functions so you may need to use the functions specific to your (unspecified) database.
it would depend on which sql server you use, but in general, you should extract the day and the month from the date (on Microsoft SQL Server it is the DATEPART function) and then group by that and calculate the averages.
SELECT DATEPART(month, date_col) AS Month,
DATEPART(day, date_col) AS Day,
AVG(temp) AS Temp,
AVG(rain) AS Rain,
...
FROM table
GROUP BY DATEPART(month, date_col), DATEPART(day, date_col)
There is an extension to postgresql called timescaledb that makes it easier to query this type of data. Beware that it does make changes to the postgresql-database that requires changes to backup-routines. And if the current database is partitioned it will require a dump and restore.
A query can look like this:
-- By month
select
extract(year from created_at) as year,
extract(month from time_bucket('1 day', created_at)) as month,
min(temp) as temp,
from
readings
where
created_at > '2019-01-01' and created_at < '2020-01-01'
group by
year,
month
order by
year,
month;
750 Mio rows. You need an efficient index. Consider this function and the index based on it.
Assuming a table weather with a date column date:
CREATE FUNCTION f_mmdd(date) -- or timestamp input?
RETURNS int LANGUAGE sql IMMUTABLE PARALLEL SAFE AS
'SELECT (EXTRACT(month FROM $1) * 100 + EXTRACT(day FROM $1))::int';
CREATE INDEX weather_mmdd_idx ON weather(f_mmdd(date));
This index helps to quickly identify all rows for a particular day of the year.
The manual about EXTRACT.
The above expression proved fastest for various reasons. Just re-ran some performance tests in Postgres 13, and nothing changed.
Details in this closely related answer:
How do you do date math that ignores the year?
There is also EXTRACT(doy FROM date) to extract the day of the year (1–365/366), which is even faster. But, obviously, there is an off-by-one error for dates past Feb 29 in leap years in the Gregorian calendar.
Then the query for Jan 01 can be:
SELECT date_trunc('day', date) -- if it's a timestamp column
-- date -- if it's really a date column (which I find hard to believe)
, avg(temperature) AS avg_temperature
, avg(rain) AS avg_rain
-- , ...
FROM weather
WHERE f_mmdd(date) = f_mmdd('2000-01-01') -- or just 101 for Jan 01
GROUP BY 1;
The year in f_mmdd('2000-01-01') is arbitrary. Or just use the integer 101 for Jan 01.
You might be able to optimize further with multicolumn indexes for particular dimensions (temperature, rain, ...). But that depends on undisclosed details.
Sounds like the dataset isn't going to change. So a MATERIALIZED VIEW with readily computed aggregates per day might be a better alternative in the long run.
A word of warning: Computed averages are only correct if the measurements are spread out evenly across each day. Else, computed numbers are just averages of the given numbers, not actual average values for each day.

How to calculate a birthday date regardeless of year

I would like to calculate customer with the account creation date in the next 8 days, so i want to calculate this date using only the day and the month , if the creation anniversary date is in 8 days, i can retrieves these contacts and contact them.
is there any exemple of query using only month an day to calculate the upcoming date?
thank you
You can try doing something like this (tested on PostgresQL, may require some adjustments for other DBs):
select * from temp_contacts where
to_char(date_created, 'MM-dd') =
to_char(current_date + interval '8 day', 'MM-dd');
For each row it first maps the date_created to a string using MM-dd format, then does the same to the current date + 8 days and compares the two.
This of course is not using any index, so it will not perform well on a table with large enough size.
Some DBs support creating function based indices. For example PostgresQL or Oracle DB. In MySQL you can achieve the same by creating an auto generated column and indexing on it.

Rounding dates in SQL

I'd like to figure out the age of a person based on two dates: their birthday and the date they were created in a database.
The age is being calculated in days instead of years, though. Here's my query:
SELECT date_of_birth as birthday, created_at, (created_at - date_of_birth) as Age
FROM public.users
WHERE date_of_birth IS NOT NULL
The date_of_birth field is a date w/o a timestamp, but the created_at field is a date with a timestamp (e.g. 2017-05-06 01:27:40).
And my output looks like this:
0 years 0 mons 9645 days 1 hours 27 mins 40.86485 secs
Any idea how can I round/calculate the ages by the nearest year?
Using PostgreSQL.
If you are using MS SQLServer than you could
CONVERT(DATE, created_at)
and than calculate difference in months like
DATEDIFF(month, created_at, GETDATE())/12
means you can use reminder in months to add or substract one year.
In PostgreSQL, dates are handled very differently to MSSQL & MySQL. In fact it follows the SQL standard very well, even if it’s not always intuitive.
To actually calculate the age of something, you can use age():
SELECT age(date1,date1)
Like all of PostgreSQL’s functions, there are variations of data type, and you may need to do something like this:
SELECT age(date1::date,date1::date)
or, more formally:
SELECT age(cast(date1 as date),cast(date1 as date))
The result will be an interval, which displays as a string :
SELECT age(current_date::date,'1981-01-17'::date);
-- 36 years 3 mons 22 days
If you just want the age in years, you can use extract:
SELECT extract('year' from age(current_date::date,'1981-01-17'::date));
Finally, if you want it correct to the nearest year, you can apply the old trick of adding half an interval:
extract('year' from age(current_date::date,'1981-01-17'::date)+interval '.5 year');
It’s not as simple as some of the other DBMS products, but it’s much more flexible, if you can get your head around it.
Here are some references:
https://www.postgresql.org/docs/current/static/functions-datetime.html
http://www.sqlines.com/postgresql/how-to/datediff

SQL query to search by day/month/year/day&month/day&year etc

I have a PostgreSQL database with events. Each event has a datetime or an interval. Common data are stored in the events table and dates are stored in either events_dates (datetime field) or events_intervals (starts_date, ends_date both are date fields).
Sample datetime events
I was born on 1930-06-09
I got my driver's license on 1950-07-12
Christmas is on 1900-12-24 (1900 is reserved for yearly reoccuring events)
Sample interval events
I'll be on vacation from 2011-06-09 till 2011-07-23
Now I have a user that will want to look up these events. They will be able to fill out a form with from and to fields and in those fields they can enter full date, day, month, year, day and month, day and year, month and year in one or both fields.
Sample queries
From May 3 to 2012 December 21 will look for events between May 3 and December 21 whose max year is 2012
From day 3 to day 15 will look for events between the 3rd and 15th day of every month and year
From day 3 will look for events on the 3rd day of every month and year (same if from is empty and to is not)
From May 3 to June will look for events between May 3 and last day of June of every year
etc.
Any tips on how to write a maintanable query (it doesn't necessarily have to be fast)?
Some things that we thought of
write all possible from, to and day/month/year combinations - not maintable
compare dates as strings e.g. input: ____-06-__ where _ is a wildcard - I wouldn't have to generate all possible combinations but this doesn't work for intervals
You can write maintainable queries that additionally are fast by using the pg/temporal extension:
https://github.com/jeff-davis/PostgreSQL-Temporal
create index on events using gist(period(start_date, end_date));
select *
from events
where period(start_date, end_date) #> :date;
select *
from events
where period(start_date, end_date) && period(:start, :end);
You can even use it to disallow overlaps as a table constraint:
alter table events
add constraint overlap_excl
exclude using gist(period(start_date, end_date) WITH &&);
write all possible from, to and day/month/year combinations - not maintable
It's actually more maintainable than you might think, e.g.:
select *
from events
join generate_series(:start_date, :end_date, :interval) as datetime
on start_date <= datetime and datetime < end_date;
But it's much better to use the above-mentioned period type.