How many days does SQL define a year interval? - sql

Query:
SELECT
CAST ('2010-12-13' AS TIMESTAMP) - CAST ('2007-01-01' AS TIMESTAMP) <= INTERVAL '4 years',
CAST ('2010-12-13' AS TIMESTAMP) <= CAST ('2007-01-01' AS TIMESTAMP) + INTERVAL '4 years',
CAST ('2010-12-13' AS TIMESTAMP) - CAST ('2008-01-01' AS TIMESTAMP) <= INTERVAL '3 years',
CAST ('2010-12-13' AS TIMESTAMP) <= CAST ('2008-01-01' AS TIMESTAMP) + INTERVAL '3 years'
Result: false true true true
Why the 1st column CAST ('2010-12-13' AS TIMESTAMP) - CAST ('2007-01-01' AS TIMESTAMP) <= INTERVAL '4 years' returns FALSE??

I think I figure it out. Interval ‘1 year’ includes only 360 days.
Column 1: left = 1442 days, right = 360 * 4 = 1440 days —> return false
Column 2: in the right expression, 2007 (year part of 2007-01-01) + 4 (the 4 year interval) = 2011 —> right = ‘2011-01-01’
Column 3: left = 1077 days, right = 360 * 3 = 1080 days —> return true

Related

Don't want to double count in Filtered Aggregation

Sample Data:
shopper_id
last_purchase_timestamp
active_p30
active_p60
active_over_p90
1
2022-03-02 1:20:00
TRUE
TRUE
TRUE
2
2022-03-01 1:30:00
TRUE
TRUE
TRUE
3
2022-02-28 1:24:03
TRUE
TRUE
TRUE
4
2022-02-02 21:22:26
FALSE
TRUE
TRUE
I want to count if the shopper was active (as in made their last purchase) in the last 30 days (starting march 5th), last 60 days, etc.
My goal is to find how many shoppers bought their last item in the last 30 days, how many shoppers bought their last item in the last 60 days etc. However I do not want to double count.
What I've attempted:
AS total_active_p30,
count(*) FILTER (where last_purchase_timestamp >= DATE '2022-03-05' - INTERVAL '60' day)
AS total_active_p60,
count(*) FILTER (where last_purchase_timestamp >= DATE '2022-03-05' - INTERVAL '90' day) AS
total_active_p90
Results:
total_active_p30
total_active_p60
total_active_p90
3
4
4
However this is causing it to double count. How can I prevent it from double counting? The total number of counts should be 4.
My ideal output would be:
total_active_p30
total_active_p60
total_active_p90
3
1
0
Thanks in advance everyone! I'm using Trino!
Your query has an incorrect logic condition. Because data of resulting this >= DATE 2022-03-05 - INTERVAL 90 day condition always are have data of resulting this >= DATE 2022-03-05 - INTERVAL 60 day condition too. For that, we must write our query:
count(*) filter (where last_purchase_timestamp >= ('2022-03-05'::date - INTERVAL '30' day))
as total_active_p30,
count(*) filter (where last_purchase_timestamp >= ('2022-03-05'::date - INTERVAL '60' day)
and last_purchase_timestamp < ('2022-03-05'::date - INTERVAL '30' day))
as total_active_p60,
count(*) filter (where last_purchase_timestamp >= ('2022-03-05'::date - INTERVAL '90' day)
and last_purchase_timestamp < ('2022-03-05'::date - INTERVAL '60' day))
as total_active_p90
Add both upper and lower bounds to the filter so they do not intersect. Something along this lines:
-- sample data
WITH dataset (last_purchase_timestamp) AS (
VALUES (timestamp '2022-03-02 1:20:00'),
(timestamp '2022-03-01 1:30:00'),
(timestamp '2022-02-28 1:24:03'),
(timestamp '2022-02-02 21:22:26')
)
-- query
select count_if(last_purchase_timestamp >= DATE '2022-03-05' - INTERVAL '30' day) total_active_p30,
count_if(last_purchase_timestamp >= DATE '2022-03-05' - INTERVAL '60' day and last_purchase_timestamp < DATE '2022-03-05' - INTERVAL '30' day) total_active_p60,
count_if(last_purchase_timestamp >= DATE '2022-03-05' - INTERVAL '90' day and last_purchase_timestamp < DATE '2022-03-05' - INTERVAL '60' day) total_active_p90
from dataset
Output:
total_active_p30
total_active_p60
total_active_p90
3
1
0

Having trouble with SQL Interval

I need to get the sum of data from a particular date going back one year. But I am having difficult with the Interval
Select 0 - sum(l.icdetailquantity )
from inventoryline l
where l.icmasterid = 'WAD185967E' and l.icdetailtranstype = 5
and l.icdetaildate <= '01-Jan-2019' and l.icdetaildate > Now() - interval '1 year'
This works as expected but I need to passs a date not use Now()
Select 0 - sum(l.icdetailquantity )
from inventoryline l
where l.icmasterid = 'WAD185967E' and l.icdetailtranstype = 5
and l.icdetaildate <= '01-Jan-2019' and l.icdetaildate > '01-Jan-2019' - interval '1 year'
the second SQl give an error
ERROR: invalid input syntax for type interval: "01-Jan-2019"
LINE 4: ...detaildate <= '01-Jan-2019' and l.icdetaildate > '01-Jan-20...
You're trying to subtract a year from a string. This does not work, you have to first cast youru string to a date object using:
to_date(date_string,format)
Example that will work for you:
to_date('01-Jan-2019', 'DD-Mon-YYYY') - interval '1 year'
http://www.postgresqltutorial.com/postgresql-to_date/
Have you tried with
Select 0 - sum(l.icdetailquantity )
from inventoryline l
where l.icmasterid = 'WAD185967E'
and l.icdetailtranstype = 5
and l.icdetaildate <= date '2019-01-01'
and l.icdetaildate > date '2019-01-01' - interval '1 year'
From the documentation I'm looking at (https://www.postgresql.org/docs/current/functions-datetime.html) the "date" keyword seems to be necessary.

invalid value "(202" for "yyyy" error in generate_series() with dates in PostgreSQL

I'm trying to generate dynamic dates based on the current date. I want to use generate_series() to populate dates between start and end dates (interval = 1 day).
If current date is before 10/1, start date is 10/1 in previous year
If current date is after 10/1, start date is 10/1 in the current year
end date is 9/30 in year 4. For example,
current date = 5/22/2019 -> start date = 10/1/2018, end date = 9/30/2021
current date = 11/1/2019 -> start date = 10/1/2019, end date = 9/30/2022
select generate_series(
to_date(cast(start_date as text), 'yyyy-mm-dd'),
to_date(concat(extract(year from to_date(cast(start_date as text), 'yyyy-mm-dd')+3),'-','09','-', 30), 'yyyy-mm-dd'),
'1 day'
)
from (
select case
when extract(month from current_date) <= 10 then concat(extract(year from current_date) -1,'-',10,'-', '01')
when extract(month from current_date) > 10 then concat(extract(year from current_date),'-',10,'-', '01')
end) as start_date
ERROR: invalid value "(202" for "yyyy"
DETAIL: Value must be an integer.
SQL state: 22007
It's complaining about year isn't integer. Which parts do I need to modify to run this query?
select case
when date_trunc('month', current_date) ::date < make_date(extract(year from current_date) ::int, 10, 1) then
generate_series(make_date((extract(year from current_date) - 1) ::int, 10, 1)
,make_date((extract(year from current_date) + 2) ::int, 10, 1) - 1
,'1 day') ::date
else
generate_series(make_date(extract(year from current_date) ::int, 10, 1)
,make_date((extract(year from current_date) + 3) ::int, 10, 1) - 1
,'1 day') ::date
end as dt;
Here In place of current_date you can use as below: current_date => '11/1/2019'::date or '05/22/2019'::date

Postgres: Select a timeinterval that spans past midnight

I have the following table:
id | time
----+-------------
1 | 21:00:00+01
2 | 22:00:00+01
3 | 23:00:00+01
Column id is of type integer and time is time with timezone. I want to select all rows that fall within a specified interval, e.g.,
select *
from times
where time >= time '22:30' - interval '60 minutes' and time <= time '22:30' + interval '60 minutes';
However, if the intervall extends past midnight, i.e., when I select 23:30 as time argument, then I get an empty result set.
Is there a way to tell postgress to ignore the minutes that span past midnight?
You can use this logic:
select *
from times t cross join
(values ('22:30'::time - interval '60 minutes', '22:30'::time + interval '60 minutes')
) v(fromt, tot)
where (fromt <= tot and time >= fromt and time <= tot) or
(fromt > tot and (time >= fromt or time <= tot))

Calculate Average Time Over 24 hour period

I'm working in Teradata and am trying to calulate the average time a job completes.
Data Values:
Job Name Start Date End Date End Time
D_BDW_CCIP_SRM_LD 10/10/2012 10/11/2012 01:41:49
D_BDW_CCIP_SRM_LD 10/9/2012 10/10/2012 00:19:56
D_BDW_CCIP_SRM_LD 10/8/2012 10/8/2012 23:37:18
D_BDW_CCIP_SRM_LD 10/5/2012 10/5/2012 23:39:47
D_BDW_CCIP_SRM_LD 10/4/2012 10/4/2012 23:42:47
D_BDW_CCIP_SRM_LD 10/3/2012 10/3/2012 23:41:54
The average is coming back with 16:07 instead of 00:07. What I need to happen is that the calculations where the job finishes next day understands that the time expanded.
In Excel I could do this by adding one day to the end time and then averaging and displaying as a time.
How do I do this in Teradata?
This is such an interesting question! UPDATED with correct syntax: Assuming your START_DATE and END_DATE are DATE values and END_TIME is a TIME value, here is a solution:
select cast( avg( case
when start_date <> end_date
then extract(second from end_time)
+ extract(minute from end_time) * 60
+ extract(hour from end_time) * 3600
+ 86400
else extract(second from end_time)
+ extract(minute from end_time) * 60
+ extract(hour from end_time) * 3600
end) mod 86400) as decimal(10,4))
* INTERVAL '00:00:01.00' HOUR TO SECOND as avg_time
from your_table
The CASE expression "adds" one day (86,400 seconds) as you suggested when using Excel to determine the average seconds since midnight into an intermediate result and converted into a TIME column.
To be fair, I received help from the Teradata Forum formatting the result, but I like this so much I'll be using it myself.
This seems to do the trick, but I'd be interested in seeing if there is another way.
SELECT job_name,
case when avg_end_time_in_minutes > 60*24 then avg_end_time_in_minutes - 60*24
else avg_end_time_in_minutes end as avg_adjusted,
case when max_end_time_in_minutes > 60*24 then max_end_time_in_minutes - 60*24
else max_end_time_in_minutes end as max_adjusted,
CAST((CAST(avg_adjusted / 60 AS INTEGER) (FORMAT '9(2)')) AS CHAR(2))||':'||
CAST((CAST((avg_adjusted / 60 MOD 1)*60 AS INTEGER) (FORMAT '9(2)')) AS CHAR(2))
avg_adjusted_time,
CAST((CAST(max_adjusted / 60 AS INTEGER) (FORMAT '9(2)')) AS CHAR(2))||':'||
CAST((CAST((max_adjusted / 60 MOD 1)*60 AS INTEGER) (FORMAT '9(2)')) AS CHAR(2))
max_adjusted_time
FROM (
SELECT job_name,
AVG(end_time_in_minutes) avg_end_time_in_minutes,
MAX(CAST(end_time_in_minutes AS DECIMAL(8,2))) max_end_time_in_minutes
FROM (
SELECT job_name,
CAST(substr(end_time, 1, 2) AS INTEGER)*60
+ CAST(substr(end_time, 4, 2) AS INTEGER)
+ cast(end_date - start_date as integer)*60*24 AS end_time_in_minutes
FROM dabank_prod_ops_tb.bdw_tables_load_tracker_view a
WHERE a.status = 'COMPLETED'
AND a.start_date BETWEEN CURRENT_DATE - 31 AND CURRENT_DATE -1
AND a.end_time IS NOT NULL
) a
GROUP BY 1
) b
First, figure out the number of seconds that the end time is from midnight on the start date. We can then use that to calculate the average number of seconds taken, and then add that to midnight to find the average end time.
select
avg(extract(second from end_time) + 60 *
(extract(minute from end_time) + 60 *
(extract(hour from end_time) + 24 *
(end_date - start_date))) as avg_duration_in_seconds
cast(avg_duration_in_seconds / 60 / 60 as integer) as avg_hours
mod(cast(avg_duration_in_seconds / 60 as integer), 60) as avg_minutes
mod(cast(avg_duration_in_seconds as integer), 60) as avg_seconds,
cast('00:00:00' as time) +
cast(avg_hours as interval hour) +
cast(avg_minutes as interval minute) +
cast(avg_seconds as interval second) as avg_end_time
from my_table
Be aware though that if the average ends up over 24 hours, avg_end_time will be something like 00:01:15 rather than 24:01:15.