How to get an accurate number of results when selecting records from a period of time - sql

I have a problem very similar to this StackOverflow question in that I need a full seven days of records. My query looks something like this:
SELECT id, entry_timestamp
FROM entries
WHERE created_on >= TO_DATE('2023-01-25', 'YYYY-MM-DD') at time zone 'UTC' - interval '7 days'
ORDER BY entry_timestamp
In my table entry_timestamp is a "timestamp without timezone" column and created_on is a "date" column.
(I need the TO_DATE() instead of current_date because I need to be able to specify a particular date)
When I run that query I get results with entry_timestamp from the last six days.
If I change it to interval '8 days' I get ten days for some reason.
What am I missing that is causing such a broad miscalculation?

Related

Different query results while using DATE_TRUNC function in WHERE clause

I have a SQL query which gives different set of result when I use condition
DATE_TRUNC('DAY', timestamp) BETWEEN date_trunc('DAY', NOW()) - interval '14' day AND date_trunc('DAY', NOW())
and a different result when I use condition
timestamp BETWEEN date_trunc('DAY', NOW()) - interval '14' day AND date_trunc('DAY', NOW())
After cross checking both the query results I found that first condition is giving correct result.
Can someone please tell me the difference between both the conditions.
Thanks in advance
timestamp has a time component.
date_trunc(day, timestamp) removes the time component.
The second part of the comparison is different. The first version returns any time on the current day.
The second version only returns midnight on the current day.
Incidentally, I would recommend:
where timestamp >= current_date - interval '14 day' and
timestamp < current_date + interval '1 day'
This works, regardless of whether the timestamp column has a time component or not. And, it is friendly to the optimizer and index usage.

Presto TIMESTAMP get data from 2 days ago without inputting year month date?

My goal is to have the query grab data from 2 days ago. I don't want to have to keep inputting the date like this:
WHERE usage_start_date
BETWEEN TIMESTAMP '2020-09-09 00:00:00.000' and TIMESTAMP '2020-09-09
23:59:59.999'
but instead something like:
usage_start_date = current_date - interval '2' day
the above works for my Athena Presto SQL query, but for some reason will not give all the data that ran in those 24 hours, instead giving about half the day. Is there a way to do a statement like this one to ensure it gives ALL data in that day?
WHERE current_date - interval '2' day AND
BETWEEN TIMESTAMP '00:00:00.000' and TIMESTAMP '23:59:59.999'
without inputting the year, month, day? It seems like TIMESTAMP needs the y/m/d but what about doing a LIKE so it picks up the hour, minute, second but no need to put the y/m/d?
To get a timestamp for the start of the day that was two days ago you can do
DATE_TRUNC('day', NOW() - INTERVAL '2' DAY)
e.g.
WHERE usage_start_date >= DATE_TRUNC('day', NOW() - INTERVAL '2' DAY)
AND usage_start_date < DATE_TRUNC('day', NOW() - INTERVAL '1' DAY)
You can use below query to achieve the task by fetching the hour and date from the usage_start_date
select * from table where hour(usage_start_date) between 0 and 23 and current_date - interval '2' day = date(usage_start_date)
I would suggest:
WHERE usage_start_date >= CURRENT_DATE - INTERVAL '2' DAY AND
usage_start_date < CURRENT_DATE - INTERVAL '1' DAY

Difference between CURRENT_TIMESTAMP and CURRENT_DATE

I want to get the data from the last 28 days and only include complete days. So what I mean is, when I look at the data today at 10:00 AM, it only includes data from yesterday (the completed day) and 28 days before yesterday.
I am creating a live dashboard with figures like this. So I don't want the numbers to change until the day is finished.
Also, I am willing to understand the difference between CURRENT_DATE and CURRENT_TIMESTAMP
For example, in my code, if I use CURRENT_TIMESTAMP, will I get the data from today 10:00 AM back to 28 days ago 10:00 AM? if not, how can I get data in a way numbers change live according to every time I run the code (the average time that data change in the database is 10 minutes).
My simplified code:
select count(id) from customers
where created_at > CURRENT_DATE - interval '28 days'
Maybe I am using wrong code, can you please give me advice on how to get the date in both formats:
include only complete days(does not include today, until the day is
finished)
include hours, from today morning until 28 days back same
time in the morning.
Assuming created_at is of type timestamptz.
include only complete days(does not include today, until the day is
finished)
Start with now() and use date_trunc():
SELECT count(*)
FROM customers
WHERE created_at < date_trunc('day', now())
AND created_at >= date_trunc('day', now() - interval '28 days');
Or work with CURRENT_DATE ...
WHERE created_at < CURRENT_DATE
AND created_at >= CURRENT_DATE - 28;
The result for both depends on the current timezone setting. The "date" functionally depends on your current time zone. The type timestamp with time zone (timestamptz) does not. But the expression date_trunc('day', now()) introduces the same dependency as the "day" is defined by your current time zone. So you need to define which "days" you mean precisely. Basics:
Ignoring time zones altogether in Rails and PostgreSQL
You can subtract integer values from a date to subtract days:
How do I determine the last day of the previous month using PostgreSQL?
now() is a shorter equivalent of CURRENT_TIMESTAMP. See:
Difference between now() and current_timestamp
count(*) is equivalent to count(id) while id is defined NOT NULL, but a bit faster.
I have different results from query for COUNT('e.id') or COUNT(e.id)
include hours, from today morning until 28 days back same time in the morning.
Simply:
WHERE created_at > now() - interval '28 days'
No dependency on the current time zone.

How to run a query for every date for last 3 month

I have a table(pkg_date) in redshift. I want to fetch some data for every date for the last 3 months.
Here is my query
select * from pkg_data where scan_date < current_date;
How can I use current_date as a variable in the query itself and run this query for every date from April 1.
I have set a cron job which will run in every hour. In every hour it should run with different current_date
SELECT *
FROM pkg_data
WHERE scan_date > CURRENT_DATE - INTERVAL '3 months'
Be careful — Redshift works in UTC, so the CURRENT_DATE might suffer from timezone effects and be +/- what you expect sometimes.
SELECT
CURRENT_DATE,
(CURRENT_DATE - INTERVAL '3 months')::date
Returns:
2018-06-21 2018-03-21
Also be careful with strange lengths of months!
SELECT DATE '2018-05-31' - INTERVAL '3 months'
returns:
2018-02-28 00:00:00
Notice that it gave the last day of the month (31st vs 28th).
By the way, you can use DATE '2018-05-31' or '2018-05-31'::DATE, and also INTERVAL '3 months' or '3 months'::INTERVAL to convert types.
Use dateadd() for getting date 3 moth old day and GETDATE() for get current date.
ie code will look like.
select * from pkg_data where scan_date < dateadd(month,-3,GETDATE());
for cron refer How to execute scheduled SQL script on Amazon Redshift?

How do you find results that occurred in the past week?

I have a books table with a returned_date column. I'd like to see the results for all of the books with a returned date that occurred in the past week.
Any thoughts? I tried doing some date math, but Postgres wasn't happy with my attempt.
You want to use interval and current_date:
select * from books where returned_date > current_date - interval '7 days'
This would return data from the past week including today.
Here's more on working with dates in Postgres.
Assuming returned_date is data type date, this is simplest and fastest:
SELECT * FROM books WHERE returned_date > CURRENT_DATE - 7;
now()::date is the Postgres implementation of standard SQL CURRENT_DATE. Both do exactly the same in PostgreSQL.
CURRENT_DATE - 7 works because one can subtract / add integer values (= days) from / to a date. An unquoted number like 7 is treated as numeric constant and initially cast to integer by default (only digits, plus optional leading sign). No explicit cast needed.
With data type timestamp or timestamptz you have to add / subtract an interval, like #Eric demonstrates. You can do the same with date, but the result is timestamp and you have to cast back to date or keep working with timestamp. Sticking to date is simplest and fastest for your purpose. Performance difference is tiny, but there is no reason not to take it. Less error prone, too.
The computation is independent from the actual data type of returned_date, the resulting type to the right of the operator will be coerced to match either way (or raise an error if no cast is registered).
For the "past week" ...
To include today make it > current_date - 7 or >= current_date - 6. But that's typically a bad idea, as "today" is only a fraction of a day and can produce odd results.
>= current_date - 7 returns rows for the last 8 days (incl. today) instead of 7 and is wrong, strictly speaking.
To exclude today make it:
WHERE returned_date >= current_date - 7
AND returned_date < current_date
Or:
WHERE returned_date BETWEEN current_date - 7
AND current_date - 1
To get the last full calendar week ending with Sunday, excluding today:
WHERE returned_date BETWEEN date_trunc('week', now())::date - 7
AND date_trunc('week', now())::date - 1
BETWEEN ... AND ... is ok for data type date (being a discrete type), but typically the wrong tool for timestamp / timestamptz. See:
How to add a day/night indicator to a timestamp column?
The exact definition of "day" and "week" always depends on your current timezone setting.
What math did you try?
This should work
select * from books where current_date - integer '7'
Taken from PostgreSQL Date/Time Functions and Operators