SQL why does dateA - dateB <= '3 years' give a different result than dateA <= dateB + '3 years' - sql

I was doing a MODE.com SQL practice question about date format.
The practice question is: Write a query that counts the number of companies acquired within 3 years, 5 years, and 10 years of being founded (in 3 separate columns). Include a column for total companies acquired as well. Group by category and limit to only rows with a founding date.
It uses two tables:
tutorial.crunchbase_companies_clean_date table, which includes information about all the companies, like company name, founded year, etc.
tutorial.crunchbase_acquisitions_clean_datetable, which includes the information about all the acquired companies, like acquired company name, acquired date, etc.
My code is:
SELECT companies.category_code,
COUNT(CASE WHEN acq.acquired_at_cleaned - companies.founded_at_clean:: timestamp <= '3 years' THEN 1 ELSE NULL END) AS less_than_3_years,
COUNT(CASE WHEN acq.acquired_at_cleaned - companies.founded_at_clean:: timestamp <= '5 years' THEN 1 ELSE NULL END) AS between_3_to_5_years,
COUNT(CASE WHEN acq.acquired_at_cleaned - companies.founded_at_clean:: timestamp <= '10 years' THEN 1 ELSE NULL END) AS within_10_years,
COUNT(1) AS total
FROM tutorial.crunchbase_companies_clean_date companies
JOIN tutorial.crunchbase_acquisitions_clean_date acq
ON companies.permalink = acq.company_permalink
WHERE companies.founded_at_clean IS NOT NULL
GROUP BY 1
ORDER BY total DESC
The result is:
My result
The answer query is:
SELECT companies.category_code,
COUNT(CASE WHEN acquisitions.acquired_at_cleaned <= companies.founded_at_clean::timestamp + INTERVAL '3 years'
THEN 1 ELSE NULL END) AS acquired_3_yrs,
COUNT(CASE WHEN acquisitions.acquired_at_cleaned <= companies.founded_at_clean::timestamp + INTERVAL '5 years'
THEN 1 ELSE NULL END) AS acquired_5_yrs,
COUNT(CASE WHEN acquisitions.acquired_at_cleaned <= companies.founded_at_clean::timestamp + INTERVAL '10 years'
THEN 1 ELSE NULL END) AS acquired_10_yrs,
COUNT(1) AS total
FROM tutorial.crunchbase_companies_clean_date companies
JOIN tutorial.crunchbase_acquisitions_clean_date acquisitions
ON acquisitions.company_permalink = companies.permalink
WHERE founded_at_clean IS NOT NULL
GROUP BY 1
ORDER BY 5 DESC
The result is:
The answer result
You can see in the screenshots that the results are very similar, but some numbers are different.
The only difference I can see between my query and the answer is in the COUNT statements, but I don't really see the difference, for example, between: acq.acquired_at_cleaned - companies.founded_at_clean:: timestamp <= '3 years' and acquisitions.acquired_at_cleaned <= companies.founded_at_clean::timestamp + INTERVAL '3 years'
I tried adding INTERVAL in my SELECT statement:
SELECT companies.category_code,
COUNT(CASE WHEN acq.acquired_at_cleaned - companies.founded_at_clean:: timestamp <= INTERVAL '3 years' THEN 1 ELSE NULL END) AS less_than_3_years,
COUNT(CASE WHEN acq.acquired_at_cleaned - companies.founded_at_clean:: timestamp <= INTERVAL '5 years' THEN 1 ELSE NULL END) AS between_3_to_5_years,
COUNT(CASE WHEN acq.acquired_at_cleaned - companies.founded_at_clean:: timestamp <= INTERVAL '10 years' THEN 1 ELSE NULL END) AS within_10_years,
COUNT(1) AS total
and remove the INTERVAL from the answer query:
SELECT companies.category_code,
COUNT(CASE WHEN acquisitions.acquired_at_cleaned <= companies.founded_at_clean::timestamp + '3 years'
THEN 1 ELSE NULL END) AS acquired_3_yrs,
COUNT(CASE WHEN acquisitions.acquired_at_cleaned <= companies.founded_at_clean::timestamp + '5 years'
THEN 1 ELSE NULL END) AS acquired_5_yrs,
COUNT(CASE WHEN acquisitions.acquired_at_cleaned <= companies.founded_at_clean::timestamp + '10 years'
THEN 1 ELSE NULL END) AS acquired_10_yrs,
COUNT(1) AS total
But the results are the same.
I tried to know the result of just the difference between the acquired_date and founded_date, to see if the value can be compared with INTERVAL. The result is in days, which looks promising to me.
The result
I try to give all the information for your consideration. Hope somebody could help. Thank you in advance!

My suggestion is to add/subtract the INTERVAL to/from one date/time and then compare with the other date/time. Don't subtract the date/times and then compare to a string literal. Your database seems to understand '3 YEARS' as 3 * 365 days, regardless of the actual number of days between someDateTime and someDateTime +/- '3 YEARS'. The actual number of days from year to year could be 365 or 366, depending on whether a leap year is crossed.
Here's a simple example of comparing with a specific interval, which also requires we know whether and how many leap years were crossed.
Fiddle
The test case:
WITH dates AS (
SELECT '2021-01-01'::date AS xdate
)
SELECT xdate - (xdate - INTERVAL '1' YEAR) AS diff
, xdate - (xdate - INTERVAL '1' YEAR) = '1 YEAR' AS b1
, xdate - (xdate - INTERVAL '1' YEAR) = '365 DAYS' AS b2
, xdate - (xdate - INTERVAL '1' YEAR) = '366 DAYS' AS b3
FROM dates
;
-- AND --
WITH dates AS (
SELECT '2021-01-01'::date AS xdate
)
SELECT xdate - (xdate - INTERVAL '1' YEAR) AS diff
, xdate - (xdate - INTERVAL '1' YEAR) = INTERVAL '1' YEAR AS b1
, xdate - (xdate - INTERVAL '1' YEAR) = INTERVAL '365 DAYS' AS b2
, xdate - (xdate - INTERVAL '1' YEAR) = INTERVAL '366 DAYS' AS b3
FROM dates
;
Result:
diff
b1
b2
b3
366 days
f
f
t
Fiddle
WITH dates AS (
SELECT '2021-01-01'::date AS xdate
)
, diff AS (
SELECT xdate - (xdate - INTERVAL '1' YEAR) AS diff
FROM dates
)
SELECT diff
, CASE WHEN diff = (366*24*60*60 * INTERVAL '1' SECOND)
THEN 1
END AS compare1
, 366*24*60*60 AS seconds
, CASE WHEN diff = (366*24*60*60 * INTERVAL '1' SECOND)
THEN 1
END AS compare2
, CASE WHEN diff = '31622400 SECONDS'
THEN 1
END AS compare3
FROM diff
;
The result:
diff
compare1
seconds
compare2
compare3
366 days
1
31622400
1
1
Original response:
The fiddle for PostgreSQL
The behavior shown here (below) is similar to the posted behavior.
The problem is the value generated isn't necessarily what you think.
Here's a test case in postgresql which might be representative of your issue.
Maybe this is related to leap year, where the number of days in a year isn't constant.
So it's probably safer to compare the dates rather than assume some number of days, which is probably the assumption <= '3 years' makes.
The test SQL:
WITH test (acquired_at_cleaned, founded_at_clean, n) AS (
SELECT current_date, current_date - INTERVAL '4' YEAR, 4 UNION
SELECT current_date, current_date - INTERVAL '3' YEAR, 3 UNION
SELECT current_date, current_date - INTERVAL '2' YEAR, 2 UNION
SELECT current_date, current_date - INTERVAL '1' YEAR, 1
)
, cases AS (
SELECT test.*
, CASE WHEN acquired_at_cleaned <= founded_at_clean::timestamp + INTERVAL '3' year
THEN 1 ELSE NULL
END AS acquired_3_yrs_case1
, CASE WHEN acquired_at_cleaned - founded_at_clean::timestamp <= '3 year'
THEN 1 ELSE NULL
END AS acquired_3_yrs_case2
, acquired_at_cleaned - founded_at_clean::timestamp AS x1
, acquired_at_cleaned - (n * INTERVAL '1' YEAR) AS x2
FROM test
)
SELECT acquired_at_cleaned AS acquired
, founded_at_clean AS founded
, n
, acquired_3_yrs_case1 AS case1
, acquired_3_yrs_case2 AS case2
, x1, x2
FROM cases
ORDER BY founded_at_clean
;
The result:
acquired
founded
n
case1
case2
x1
x2
2021-12-25
2017-12-25 00:00:00
4
null
null
1461 days
2017-12-26 00:00:00
2021-12-25
2018-12-25 00:00:00
3
1
null
1096 days
2018-12-26 00:00:00
2021-12-25
2019-12-25 00:00:00
2
1
1
731 days
2019-12-26 00:00:00
2021-12-25
2020-12-25 00:00:00
1
1
1
365 days
2020-12-26 00:00:00
Interesting result.

Related

Adding custom column to calendar table containing the year number of the following year on and after 2nd Sunday of December of each year

I have created the following calendar table:
WITH dates AS (
SELECT EXPLODE(SEQUENCE(TO_DATE('1970-01-01'), TO_DATE('2100-12-31'), INTERVAL 1 DAY)) AS calendar_date
),
calendar_table AS (
SELECT
YEAR(calendar_date) * 10000 + MONTH(calendar_date) * 100 + DAY(calendar_date) AS date_integer,
calendar_date,
YEAR(calendar_date) AS year_of_date,
QUARTER(calendar_date) AS quarter_of_year,
MONTH(calendar_date) AS month_of_year,
DAY(calendar_date) AS day_of_month,
WEEKDAY(calendar_date) + 1 AS day_of_week_start_monday,
DAYOFWEEK(calendar_date) AS day_of_week_start_sunday,
CASE
WHEN DAY(calendar_date) >= 1 AND DAY(calendar_date) <= 7 THEN 1
WHEN DAY(calendar_date) >= 8 AND DAY(calendar_date) <= 14 THEN 2
WHEN DAY(calendar_date) >= 15 AND DAY(calendar_date) <= 21 THEN 3
WHEN DAY(calendar_date) >= 22 AND DAY(calendar_date) <= 28 THEN 4
ELSE 5
END AS day_of_week_ordinal,
CASE
WHEN WEEKDAY(calendar_date) < 5 THEN TRUE
ELSE FALSE
END AS is_week_day,
CASE
WHEN WEEKDAY(calendar_date) > 4 THEN TRUE
ELSE FALSE
END AS is_weekend,
CASE
WHEN calendar_date = DATE_TRUNC('month', calendar_date)::DATE THEN TRUE
ELSE FALSE
END AS is_first_day_of_month,
CASE
WHEN calendar_date = LAST_DAY(calendar_date) THEN TRUE
ELSE FALSE
END AS is_last_day_of_month,
DAYOFYEAR(calendar_date) AS day_of_year,
WEEKOFYEAR(calendar_date) AS iso_week_of_year,
EXTRACT(YEAROFWEEK FROM calendar_date) AS iso_year_of_date,
FROM
dates
)
I am missing a custom calendar column that would abide by the following rule:
From the second Sunday (inclusive) in December of each year, the column should contain a concatenation of 'X' and the year number of the following year.
Example:
calendar_date
custom_column
2022-12-10
X2022
2022-12-11
X2023
2022-12-12
X2023
...
...
2023-12-09
X2023
2023-12-10
X2024
2023-12-11
X2024
So far, I've been able to identify the second Sunday of December in each year by combining the logic behind the columns month_of_year, day_of_week_ordinal and day_of_week_start_monday in my calendar table, but I fail to grasp any implementation (I'm sure I'm missing something simple here).
I can calculate a flag for the second Sunday in December of each year by utilising the following logic:
CASE
WHEN
month_of_year = 12
AND day_of_week_ordinal = 2
AND day_of_week_start_monday = 7 THEN TRUE
ELSE FALSE
END AS second_sunday_in_month
But I fail to see how I can get transfer this logic to what I want as the end result.
Edit: I have added a PostgreSQL fiddle as an interactive example.
Here this might be able to help you :
SELECT
*,
CASE
WHEN calendar_date between (select calendar_date from calendar_table
where month_of_year = 12 -- December
AND day_of_week_start_monday = 7 -- Sunday
AND day_of_week_ordinal = 2 ) and (select date_trunc('year',calendar_date + interval '1 year') - interval '1 day')
THEN 'X' || date_part('year', calendar_date) +1
ELSE 'X' || date_part('year', calendar_date)
END AS is_second_sunday_of_december
FROM calendar_table;
https://dbfiddle.uk/WynGf5w_
Problem is its only working on yearly basis so there might be more tweaking needed!
Update:
There you go:
CASE
WHEN
year_of_date = (select year_of_date from calendar_table where month_of_year = 12 -- December
AND day_of_week_start_monday = 7 -- Sunday
AND day_of_week_ordinal = 2)
AND calendar_date between (select calendar_date from calendar_table
where month_of_year = 12 -- December
AND day_of_week_start_monday = 7 -- Sunday
AND day_of_week_ordinal = 2 ) and (select date_trunc('year',calendar_date + interval '1 year') - interval '1 day')
THEN 'X' || date_part('year', calendar_date) + 1
else 'X' || date_part('year', calendar_date)
END AS is_second_sunday_of_december
from calendar_table
fiddle
i hope this time the link works if not there is an underscore on the end.

Percentage Difference Using CASE WHEN clause

The table I am working with is called 'transactions'. The columns are id (customer id), amount (amount spent by customer), timestamp (time of purchase).
I am trying to query:
yesterdays revenue: sum of amount.
percent difference from 8 day's ago revenue to yesterday's revenue.
MTD.
percent difference from last months MTD to this months MTD.
SAMPLE DATA
id
amount
timestamp
1
50
2021-12-01
2
60
2021-12-02
3
70
2021-11-05
4
80
2022-01-26
5
90
2022-01-25
6
20
2022-01-26
7
80
2022-01-19
EXPECTED OUTPUT
yesterday_revenue
pct_change_week_ago
mtd
pct_change_month_prior
100
0.25
270
0.50
This is my code. The percent change columns are both incorrect. Please help.
select
-- yesterday
sum(case when timestamp::date = current_date - 1 then amount else null end) yesterday_revenue,
-- yesterday v. last week
(sum(case when timestamp::date > current_date - 1 then amount else null end) - sum(case when timestamp::date = current_date - 8 then amount else null end))
/ sum(case when timestamp::date = current_date - 8 then amount else null end) pct_change_week_ago,
-- mtd
sum(case when date_trunc('month',timestamp) = date_trunc('month',CURRENT_DATE -1) then amount else null end) mtd,
-- mtd v. month prior
(sum(case when date_trunc('month',timestamp) = date_trunc('month',CURRENT_DATE -1) then amount else null end) - sum(case when date_trunc('month',timestamp) = date_trunc('month',CURRENT_DATE -1) - interval '1 month'
and date_part('day',timestamp ) <= date_part('day', CURRENT_DATE -1) then amount else null end))
/ sum(case when date_trunc('month',timestamp) = date_trunc('month',CURRENT_DATE -1) - interval '1 month'
and date_part('day',timestamp ) <= date_part('day', CURRENT_DATE -1) then amount else null end) pct_change_month_prior
from transactions
Some things to consider:
"yesterday vs last week" currently uses timestamp::date > current_date - 1 at the start. This will include transactions from today only, not yesterday (it says "greater than yesterday"). I think it should be timestamp::date = current_date - 1
I could be wrong here, but I think sum(case when date_trunc('month',timestamp) = date_trunc('month',CURRENT_DATE -1) will capture transactions on the current date as well if the current date is in the same month as yesterday. You may not want that.
As far as I can tell, 'pct_change_month_prior' should be 1.45, not 0.5. You have 110 in December and 270 in January. 270 - 110 = 160 and 160 / 110 = 1.45. Your existing query already returns that result. FWIW, you can also use new/old-1 to get the same result in a slightly simpler way.
OK, so really, it's about your maths in the SELECT part of your statement.
Amount changed is: new - old
As a multiplicative amount, it is (new - old) / old or new / old - 1
As a percentage, you need to multiply by 100... 100 * (new / old - 1) but I understand you aren't worried about this.
Further to this, let's make sure your new and old are correct.
Yesterday's sum:
sum(case when timestamp::date = CURRENT_DATE - 1 then amount else null end)
8 days ago sum:
sum(case when timestamp::date = CURRENT_DATE - 8 then amount else null end)
1 month to yesterday sum:
sum(case when timestamp::date > CURRENT_DATE - 1 - INTERVAL '1 month' AND timestamp::date <= CURRENT_DATE - 1 then amount else null end)
1 month to 1 month and 1 day ago sum:
sum(case when timestamp::date > CURRENT_DATE - 1 - INTERVAL '2 month' AND timestamp::date <= CURRENT_DATE - 1 - INTERVAL '1 month' then amount else null end)
Start of month to yesterday sum:
sum(case when timestamp::date > DATE_TRUNC('month', CURRENT_DATE - 1) AND timestamp::date <= CURRENT_DATE - 1 then amount else null end)
Start of yesterday's last month to a month ago yesterday sum:
sum(case when timestamp::date > DATE_TRUNC('month', CURRENT_DATE - 1 - INTERVAL '1 month') AND timestamp::date <= CURRENT_DATE - 1 - INTERVAL '1 month' then amount else null end)
It's important you don't change = to > or cropping to just the date part else you will include more than you really want.
Effectively, by cropping to the month part, it would almost always sum all transactions for two months.

Storing Variable in BigQuery SQL UDF

I am trying to create an SQL UDF in bigQuery to calculate week in month. I got the result that I'm expecting but my function looks super messy.
create or replace function internal.week_in_month(my_date TIMESTAMP)
returns FLOAT64 as
(
case when
(case when EXTRACT(DAYOFWEEK FROM date_add(date(my_date), INTERVAL -(EXTRACT(DAY FROM my_date))+1 day)) = 1 then 7
else EXTRACT(DAYOFWEEK FROM date_add(date(my_date), INTERVAL -(EXTRACT(DAY FROM my_date))+1 day)) -1 end) > 1 then -- check first day of month to decide if it's a complete week (starts on Monday)
case when EXTRACT(DAY FROM my_date) <= 7 then -- for incomplete week
case when
(case when EXTRACT(DAYOFWEEK FROM my_date) = 1 then 7 else EXTRACT(DAYOFWEEK FROM my_date)-1 end) - EXTRACT(DAY FROM my_date) =
(case when EXTRACT(DAYOFWEEK FROM date_add(date(my_date), INTERVAL -(EXTRACT(DAY FROM my_date))+1 day)) = 1 then 7
else EXTRACT(DAYOFWEEK FROM date_add(date(my_date), INTERVAL -(EXTRACT(DAY FROM my_date))+1 day)) -1 end) -1 then 1 -- incomplete week 1
else FLOOR(( EXTRACT(DAY FROM my_date) + (case when EXTRACT(DAYOFWEEK FROM date_add(date(my_date), INTERVAL -(EXTRACT(DAY FROM my_date))+1 day)) = 1 then 7
else EXTRACT(DAYOFWEEK FROM date_add(date(my_date), INTERVAL -(EXTRACT(DAY FROM my_date))+1 day)) -1 end) -2 )/7)+1 end -- calculate week based on date
else FLOOR(( EXTRACT(DAY FROM my_date) + (case when EXTRACT(DAYOFWEEK FROM date_add(date(my_date), INTERVAL -(EXTRACT(DAY FROM my_date))+1 day)) = 1 then 7
else EXTRACT(DAYOFWEEK FROM date_add(date(my_date), INTERVAL -(EXTRACT(DAY FROM my_date))+1 day)) -1 end) -2 )/7)+1 end -- calculate week based on date
else FLOOR((EXTRACT(DAY FROM my_date)-1)/7)+1 -- for complete week
end
)
Is there a way to store a variable in bigQuery so my code can be less confusing to look? So far I've been reading bQ documentation and haven't found a way to store a variable inside function (for SQL UDF)
Any help would be appreciated, thanks!
Agree - it looks ugly!
So, I see you are reusing below expression eight(8) times
EXTRACT(DAYOFWEEK FROM date_add(date(my_date), INTERVAL -(EXTRACT(DAY FROM my_date))+1 day))
Consider below approach to refactor your initial code to use that expression only once into ABC alias/field and than reuse it in a "variable" fashion
create or replace function internal.week_in_month(my_date TIMESTAMP)
returns FLOAT64 as
((
select case
when (case when ABC = 1 then 7 else ABC -1 end) > 1 then -- check first day of month to decide if it's a complete week (starts on Monday)
case when EXTRACT(DAY FROM my_date) <= 7 then -- for incomplete week
case when
(case when EXTRACT(DAYOFWEEK FROM my_date) = 1 then 7 else EXTRACT(DAYOFWEEK FROM my_date)-1 end) - EXTRACT(DAY FROM my_date) =
(case when ABC = 1 then 7
else ABC -1 end) -1 then 1 -- incomplete week 1
else FLOOR(( EXTRACT(DAY FROM my_date) + (case when ABC = 1 then 7
else ABC -1 end) -2 )/7)+1 end -- calculate week based on date
else FLOOR(( EXTRACT(DAY FROM my_date) + (case when ABC = 1 then 7
else ABC -1 end) -2 )/7)+1 end -- calculate week based on date
else FLOOR((EXTRACT(DAY FROM my_date)-1)/7)+1 -- for complete week
end
from unnest([struct(EXTRACT(DAYOFWEEK FROM date_add(date(my_date), INTERVAL -(EXTRACT(DAY FROM my_date))+1 day)) as ABC)])
));
We can go further and apply same approach to EXTRACT(DAY FROM my_date) which is used six(6) times
create or replace function internal.week_in_month(my_date TIMESTAMP)
returns FLOAT64 as
((
select case
when (case when ABC = 1 then 7 else ABC -1 end) > 1 then -- check first day of month to decide if it's a complete week (starts on Monday)
case when XYZ <= 7 then -- for incomplete week
case when
(case when EXTRACT(DAYOFWEEK FROM my_date) = 1 then 7 else EXTRACT(DAYOFWEEK FROM my_date)-1 end) - XYZ =
(case when ABC = 1 then 7
else ABC -1 end) -1 then 1 -- incomplete week 1
else FLOOR(( XYZ + (case when ABC = 1 then 7
else ABC -1 end) -2 )/7)+1 end -- calculate week based on date
else FLOOR(( XYZ + (case when ABC = 1 then 7
else ABC -1 end) -2 )/7)+1 end -- calculate week based on date
else FLOOR((XYZ-1)/7)+1 -- for complete week
end
from unnest([struct(EXTRACT(DAY FROM my_date) as XYZ)]),
unnest([struct(EXTRACT(DAYOFWEEK FROM date_add(date(my_date), INTERVAL -(XYZ)+1 day)) as ABC)])
));
In above example - I was lazy enough and used just XYZ and ABC - in hope that you will use appropriate naming which will make our final code readable enough to be happy with

Select data grouped by time over midnight

I have a table like:
ID TIMEVALUE
----- -------------
1 06.07.15 06:43:01,000000000
2 06.07.15 12:17:01,000000000
3 06.07.15 18:21:01,000000000
4 06.07.15 23:56:01,000000000
5 07.07.15 04:11:01,000000000
6 07.07.15 10:47:01,000000000
7 07.07.15 12:32:01,000000000
8 07.07.15 14:47:01,000000000
and I want to group this data by special times.
My current query looks like this:
SELECT TO_CHAR(TIMEVALUE, 'YYYY\MM\DD'), COUNT(ID),
SUM(CASE WHEN TO_CHAR(TIMEVALUE, 'HH24MI') <=700 THEN 1 ELSE 0 END) as morning,
SUM(CASE WHEN TO_CHAR(TIMEVALUE, 'HH24MI') >700 AND TO_CHAR(TIMEVALUE, 'HH24MI') <1400 THEN 1 ELSE 0 END) as daytime,
SUM(CASE WHEN TO_CHAR(TIMEVALUE, 'HH24MI') >=1400 THEN 1 ELSE 0 END) as evening FROM Table
WHERE TIMEVALUE >= to_timestamp('05.07.2015','DD.MM.YYYY')
GROUP BY TO_CHAR(TIMEVALUE, 'YYYY\MM\DD')
and I am getting this output
day overall morning daytime evening
----- ---------
2015\07\05 454 0 0 454
2015\07\06 599 113 250 236
2015\07\07 404 139 265 0
so that is fine grouping on the same day (0-7 o'clock, 7-14 o'clock and 14-24 o'clock)
But my question now is:
How can I group over midnight?
For example count from 6-14 , 14-23 and 23-6 o'clock on next day.
I hope you understand my question. You are welcome to even improve my upper query if there is a better solution.
EDIT: It is tested now: SQL Fiddle
The key is simply to adjust the group by so that anything before 6am gets grouped with the previous day. After that, the counts are pretty straight-forward.
SELECT TO_CHAR(CASE WHEN EXTRACT(HOUR FROM timevalue) < 6
THEN timevalue - 1
ELSE timevalue
END, 'YYYY\MM\DD') AS day,
COUNT(*) AS overall,
SUM(CASE WHEN EXTRACT(HOUR FROM timevalue) >= 6 AND EXTRACT(HOUR FROM timevalue) < 14
THEN 1 ELSE 0 END) AS morning,
SUM(CASE WHEN EXTRACT(HOUR FROM timevalue) >= 14 AND EXTRACT(HOUR FROM timevalue) < 23
THEN 1 ELSE 0 END) AS daytime,
SUM(CASE WHEN EXTRACT(HOUR FROM timevalue) < 6 OR EXTRACT(HOUR FROM timevalue) >= 23
THEN 1 ELSE 0 END) AS evening
FROM my_table
WHERE timevalue >= TO_TIMESTAMP('05.07.2015','DD.MM.YYYY')
GROUP BY TO_CHAR(CASE WHEN EXTRACT(HOUR FROM timevalue) < 6
THEN timevalue - 1
ELSE timevalue
END, 'YYYY\MM\DD');
Substract 1 day from timevalue for times lower than '06:00' at first and then:
SQLFiddle demo
select TO_CHAR(day, 'YYYY\MM\DD') day, COUNT(ID) cnt,
SUM(case when '23' < tvh or tvh <= '06' THEN 1 ELSE 0 END) as midnight,
SUM(case when '06' < tvh and tvh <= '14' THEN 1 ELSE 0 END) as daytime,
SUM(case when '14' < tvh and tvh <= '23' THEN 1 ELSE 0 END) as evening
FROM (
select id, to_char(TIMEVALUE, 'HH24') tvh,
trunc(case when (to_char(timevalue, 'hh24') <= '06')
then timevalue - interval '1' day
else timevalue end) day
from t1
)
GROUP BY day
Maybe you can do it like this (with some reformatting or PIVOT):
WITH spans AS
(SELECT TIMESTAMP '2015-01-01 00:00:00' + LEVEL * INTERVAL '1' HOUR AS start_time
FROM dual
CONNECT BY TIMESTAMP '2015-01-01 00:00:00' + LEVEL * INTERVAL '1' HOUR < LOCALTIMESTAMP),
t AS
(SELECT start_time, lead(start_time, 1) OVER (ORDER BY start_time) AS end_time, ROWNUM AS N
FROM spans
WHERE EXTRACT(HOUR FROM start_time) IN (6,14,23))
SELECT N, start_time, end_time, COUNT(*) AS ID_COUNT,
DECODE(EXTRACT(HOUR FROM start_time), 6,'morning', 14,'daytime', 23,'evening') AS daytime
FROM t
JOIN YOUR_TABLE WHERE TIMEVALUE BETWEEN start_time AND end_time
GROUP BY N;
Of course, the initial time value ('2015-01-01 00:00:00' in my example) has to be lower than the least date in your table.

Find the last weekday for a given month in PostgreSQL

Find the last weekday for a given month in PostgreSQL
Usage: If month end falls on a Saturday or a Sunday, return the previous Friday, else use month end
Examples:
3/31/2013 falls on a Sunday, so return 3/29/2013
11/30/2013 falls on a Saturday, so return 11/29/2013
How to write this in PostgreSQL SQL?
What I have so far is this (returns only Month Ends, but month ends don't exist when they fall on a Saturday or Sunday):
SELECT as_of_dt, sum(bank_shr_bal) as bank_shr_bal
FROM hm_101.vw_gl_bal
WHERE as_of_dt = (date_trunc('MONTH', as_of_dt) + INTERVAL '1 MONTH - 1 day')::date
GROUP BY 1
Thanks
A calendar table really simplifies the SQL for queries like these. (The table "weekdays" is actually a view based on the calendar table. The structure of it should be obvious.)
select max(cal_date)
from weekdays
where cal_date < '2013-05-01'
or
select max(cal_date)
from weekdays
where cal_date between '2013-04-01' and '2013-04-30'
with s as (
select *, (date_trunc('MONTH', as_of_dt) + INTERVAL '1 MONTH - 1 day')::date last_day
from
hm_101.vw_gl_bal
)
SELECT
as_of_dt,
gl_acct_nbr,
cc_nbr,
sum(bank_shr_bal) as bank_shr_bal
FROM s
WHERE as_of_dt = (
last_day
-
(extract(dow from last_day) = 5)::int
-
2 * (extract(dow from last_day) = 6)::int
)
GROUP BY 1,2,3
One solution is to use a CTE, find the last day by month in the data and the actual last day for each month
WITH s1
as
(
SELECT
date_part('YEAR', as_of_dt) AOD_Year
,date_part('MONTH', as_of_dt) AOD_Month
,(date_trunc('MONTH', as_of_dt) + INTERVAL '1 MONTH - 1 day')::date AOD_MonthEnd
,max(as_of_dt) AOD_LastFound
FROM hm_101.vw_gl_bal
where (date_trunc('MONTH', as_of_dt) + INTERVAL '1 MONTH - 1 day')::date = '2013-03-31'
group by 1, 2, 3
)
SELECT
s1.AOD_MonthEnd
,s1.AOD_LastFound
,sum(v.bank_shr_bal) as bank_shr_bal
FROM hm_101.vw_gl_bal v
INNER JOIN s1
on v.as_of_dt = s1.AOD_LastFound
WHERE v.as_of_dt = '2013-03-29'
GROUP BY 1, 2
What you want to do is remove between 0 and 2 days from the last day of the month (which you have).
By extracting the Day Of Week (DOW) and check if it's 0 (Sunday) or 6 (Saturday), we know how many days to remove.
You can do it like this:
... - INTERVAL '1 day' * CASE date_part('DOW', last_day_of_month)
WHEN 0 THEN 2 -- Sunday, remove 2 days.
WHEN 6 THEN 1 -- Saturday, remove 1 day.
ELSE 0 -- Don't remove any days.
END
For the sake of readability I didn't include the complete last_day_of_month-calculation in there.
You can actually do this without CTEs or stored procedures.
select
case
when extract(dow from last_day_of_month) = 0
then last_day_of_month - 2
when extract(dow from last_day_of_month) = 6
then last_day_of_month - 1
else
last_day_of_month
end as last_weekday_of_month
from(
SELECT (date_trunc('MONTH', as_of_dt)
+ INTERVAL '1 MONTH - 1 day')::date as last_day_of_month
from hm_101.vw_gl_bal
)subquery;
select
case
when extract(dow from first_day_of_month) = 0 then first_day_of_month
when extract(dow from first_day_of_month) = 1 then first_day_of_month - 1
when extract(dow from first_day_of_month) = 2 then first_day_of_month - 2
when extract(dow from first_day_of_month) = 3 then first_day_of_month - 3
when extract(dow from first_day_of_month) = 4 then first_day_of_month - 4
when extract(dow from first_day_of_month) = 5 then first_day_of_month - 5
when extract(dow from first_day_of_month) = 6 then first_day_of_month - 6
end as first_weekday_of_month,
case
when extract(dow from last_day_of_month) = 6 then last_day_of_month
when extract(dow from last_day_of_month) = 5 then last_day_of_month - 6
when extract(dow from last_day_of_month) = 4 then last_day_of_month - 5
when extract(dow from last_day_of_month) = 3 then last_day_of_month - 4
when extract(dow from last_day_of_month) = 2 then last_day_of_month - 3
when extract(dow from last_day_of_month) = 1 then last_day_of_month - 2
when extract(dow from last_day_of_month) = 0 then last_day_of_month - 1
end as last_weekday_of_month
from(
SELECT
(date_trunc('month', current_date) -'7day'::interval)::date first_day_of_month,
(date_trunc('month', current_date) -'1day'::interval)::date as last_day_of_month
)subquery;