Count of rides in each week over the last 12 weeks - sql

Assume that we have the following tables, with columns as indicated:
Rides
ride_id
start_time
end_time
passenger_id
driver_id
ride_region
is_completed (Y/N)
Drivers
driver_id
onboarding_time
home_region
Write a query that we could use to create a plot of the total count of rides completed in our San Francisco region, for each week over the last 12 weeks.
I have used datepart to get the count for every week. But I am not sure how to include the clause which outputs last 12 weeks from TODAY. My code will give a count for week 1 to 12 from the earliest start time.
Please check my code and correct me.
SELECT datepart(week, START_TIME), COUNT(RIDE_ID)
FROM RIDES
WHERE is completed = 'Y' AND ride_region ='San Francisco' AND
datepart(week, START_TIME) <= 12
group by `datepart(week, START_TIME)`;
I expect count output for last 12 weeks based on week.

Instead of:
AND datepart(week, START_TIME) <= 12
use this
AND START_TIME > current_date - interval '84 day'
because you want all the rows from the last 12 weeks = 84 days
and group by datepart(week, START_TIME)

If you want the last 12 weeks from current_date
SELECT datepart(week, START_TIME), COUNT(RIDE_ID)
FROM RIDES
WHERE is completed = 'Y'
AND ride_region ='San Francisco'
AND datepart(week, START_TIME) between (date_trunc('week', current_date) -12)
AND date_trunc('week', current_date)
group by datepart(week, START_TIME);

This is a bit complicated, because you probably don't want partial weeks. So, subtracting 12 weeks (or 84 days) may not be sufficient.
I would recommend logic more like this:
where start_time >= date_trunc('week', curdate()) - interval '12 week') and
start_time < date_trunc('week', curdate())
This gives the last 12 full weeks of data, based on calendar weeks.
You have already decided to use the canonical definition of week, so this makes sense. Alternatively, you could have the definition of week starting on any day of the week.

Select ride_id, start_time, end_time, datediff(week,start_time,end_time) as [WEEKLYRIDE],
Count(ride_id) AS (TOTALRIDES)
From rides
Where ride_region = ‘San Francisco’
And ride_completed = ‘Y’
And datediff(wk,start_time, end_time) BETWEEN 1 and 12
GROUP BY datediff(wk,start_time, end_time)

Related

How can I aggregate time series data in postgres from a specific timestamp & fixed intervals (e.g. 1 hour , 1 day, 7 day ) without using date_trunc()?

I have a postgres table "Generation" with half-hourly timestamps spanning 2009 - present with energy data:
I need to aggregate (average) the data across different intervals from specific timepoints, for example data from 2021-01-07T00:00:00.000Z for one year at 7 day intervals, or 3 months at 1 day interval or 7 days at 1h interval etc. date_trunc() partly solves this, but rounds the weeks to the nearest monday e.g.
SELECT date_trunc('week', "DATETIME") AS week,
count(*),
AVG("GAS") AS gas,
AVG("COAL") AS coal
FROM "Generation"
WHERE "DATETIME" >= '2021-01-07T00:00:00.000Z' AND "DATETIME" <= '2022-01-06T23:59:59.999Z'
GROUP BY week
ORDER BY week ASC
;
returns the first time series interval as 2021-01-04 with an incorrect count:
week count gas coal
"2021-01-04 00:00:00" 192 18291.34375 2321.4427083333335
"2021-01-11 00:00:00" 336 14477.407738095239 2027.547619047619
"2021-01-18 00:00:00" 336 13947.044642857143 1152.047619047619
****EDIT: the following will return the correct weekly intervals by checking the start date relative to the nearest monday / start of week, and adjusts the results accordingly:
WITH vars1 AS (
SELECT '2021-01-07T00:00:00.000Z'::timestamp as start_time,
'2021-01-28T00:00:00.000Z'::timestamp as end_time
),
vars2 AS (
SELECT
((select start_time from vars1)::date - (date_trunc('week', (select start_time from vars1)::timestamp))::date) as diff
)
SELECT date_trunc('week', "DATETIME" - ((select diff from vars2) || ' day')::interval)::date + ((select diff from vars2) || ' day')::interval AS week,
count(*),
AVG("GAS") AS gas,
AVG("COAL") AS coal
FROM "Generation"
WHERE "DATETIME" >= (select start_time from vars1) AND "DATETIME" < (select end_time from vars1)
GROUP BY week
ORDER BY week ASC
returns..
week count gas coal
"2021-01-07 00:00:00" 336 17242.752976190477 2293.8541666666665
"2021-01-14 00:00:00" 336 13481.497023809523 1483.0565476190477
"2021-01-21 00:00:00" 336 15278.854166666666 1592.7916666666667
And then for any daily or hourly (swap out day with hour) intervals you can use the following:
SELECT date_trunc('day', "DATETIME") AS day,
count(*),
AVG("GAS") AS gas,
AVG("COAL") AS coal
FROM "Generation"
WHERE "DATETIME" >= '2022-01-07T00:00:00.000Z' AND "DATETIME" < '2022-01-10T23:59:59.999Z'
GROUP BY day
ORDER BY day ASC
;
In order to select the complete week, you should change the WHERe-clause to something like:
WHERE "DATETIME" >= date_trunc('week','2021-01-07T00:00:00.000Z'::timestamp)
AND "DATETIME" < (date_trunc('week','2022-01-06T23:59:59.999Z'::timestamp) + interval '7' day)::date
This will effectively get the records from January 4,2021 until (and including ) January 9,2022
Note: I changed <= to < to stop the end-date being included!
EDIT:
when you want your weeks to start on January 7, you can always group by:
(date_part('day',(d-'2021-01-07'))::int-(date_part('day',(d-'2021-01-07'))::int % 7))/7
(where d is the column containing the datetime-value.)
see: dbfiddle
EDIT:
This will get the list from a given date, and a specified interval.
see DBFIFFLE
WITH vars AS (
SELECT
'2021-01-07T00:00:00.000Z'::timestamp AS qstart,
'2022-01-06T23:59:59.999Z'::timestamp AS qend,
7 as qint,
INTERVAL '1 DAY' as qinterval
)
SELECT
(select date(qstart) FROM vars) + (SELECT qinterval from vars) * ((date_part('day',("DATETIME"-(select date(qstart) FROM vars)))::int-(date_part('day',("DATETIME"-(select date(qstart) FROM vars)))::int % (SELECT qint FROM vars)))::int) AS week,
count(*),
AVG("GAS") AS gas,
AVG("COAL") AS coal
FROM "Generation"
WHERE "DATETIME" >= (SELECT qstart FROM vars) AND "DATETIME" <= (SELECT qend FROM vars)
GROUP BY week
ORDER BY week
;
I added the WITH vars to do the variable stuff on top and no need to mess with the rest of the query. (Idea borrowed here)
I only tested with qint=7,qinterval='1 DAY' and qint=14,qinterval='1 DAY' (but others values should work too...)
Using the function EXTRACT you may calculate the difference in days, weeks and hours between your timestamp ts and the start_date as follows
Difference in Days
extract (day from ts - start_date)
Difference in Weeks
Is the difference in day divided by 7 and truncated
trunc(extract (day from ts - start_date)/7)
Difference in Hours
Is the difference in day times 24 + the difference in hours of the day
extract (day from ts - start_date)*24 + extract (hour from ts - start_date)
The difference can be used in GROUP BY directly. E.g. for week grouping the first group is difference 0, i.e. same week, the next group with difference 1, the next week, etc.
Sample Example
I'm using a CTE for the start date to avoid multpile copies of the paramater
with start_time as
(select DATE'2021-01-07' as start_ts),
prep as (
select
ts,
extract (day from ts - (select start_ts from start_time)) day_diff,
trunc(extract (day from ts - (select start_ts from start_time))/7) week_diff,
extract (day from ts - (select start_ts from start_time)) *24 + extract (hour from ts - (select start_ts from start_time)) hour_diff,
value
from test_table
where ts >= (select start_ts from start_time)
)
select week_diff, avg(value)
from prep
group by week_diff order by 1

MSSQL query for all records between two date range of current day

I might not be asking this right, but basically I need a query that when ran, returns all records entered from the 1st till the 15th of the current month. And when the 15 passes only return the records from the 16th till the end of the current month.
I've tried to build something like this but its for bigquery and not sql, and I can't seem to find something similar for mssql 2016.
select sample_id
from dbo.table
WHERE date_entered BETWEEN DATE_ADD(CURRENT_DATE(), -15, 'DAY') AND CURRENT_DATE()
or
WHERE date_entered BETWEEN CAST(eomonth(GETDATE()) AS datetime) AND CURRENT_DATE()
Regardless of the today's date, I need the 1st till today, until the 15th. Then the 16th till today, until the end of the month. Sorry I'm new to SQL.
UPDATE: I was able to solve this issue with the example provided by #GordonLinoff . Thank you Gordon!
SELECT rowguid, ModifiedDate
FROM [AdventureWorks2017].[Person].[Person]
WHERE Year(ModifiedDate) =Year(getdate()) and month(ModifiedDate) =month(getdate()) and
((day(getdate()) <= 15 and day(ModifiedDate) <=15))
Or
((day(getdate()) >= 16 and day(ModifiedDate) >=16))
The description of your logic is a bit hard to follow, but you seem to want something like this:
where date_entered >= datefromparts(year(getdate(), month(getdate(), 1)) and -- this month
(day(getdate()) <= 15 or
day(getdate()) > 15 and day(date_entered) > 15
)
This was MySQL:
SELECT *
FROM dbo.table
WHERE date BETWEEN CASE WHEN DAY(CURRENT_DATE) <= 15
THEN DATE_FORMAT(CURRENT_DATE, '%Y-%m-01')
ELSE DATE_FORMAT(CURRENT_DATE, '%Y-%m-16')
END
AND CASE WHEN DAY(CURRENT_DATE) <= 15
THEN DATE_FORMAT(CURRENT_DATE, '%Y-%m-15')
ELSE LAST_DAY(CURRENT_DATE)
END
Big Query:
SELECT *
FROM table
WHERE date BETWEEN CASE WHEN EXTRACT(DAY FROM CURRENT_DATE) <= 15
THEN DATE_TRUNC(CURRENT_DATE, MONTH)
ELSE DATE_ADD(DATE_TRUNC(CURRENT_DATE, MONTH), INTERVAL 15 DAY)
END
AND CASE WHEN EXTRACT(DAY FROM CURRENT_DATE) <= 15
THEN DATE_ADD(DATE_TRUNC(CURRENT_DATE, MONTH), INTERVAL 14 DAY)
ELSE DATE_ADD(CURRENT_DATE, INTERVAL 31 DAY)
END
This should give
date between 1 and 15
or date between 16 and last_of_the_month
I've tried to build something like this but its for bigquery
Whatever example you use - it is not working in BigQuery either!
Below is working example for BigQuery Standard SQL and uses some "tricks" to avoid using redundant code fragments
#standardSQL
SELECT sample_id
FROM `project.dataset.table`,
UNNEST([STRUCT(
EXTRACT(DAY FROM date_entered) AS day,
DATE_TRUNC(date_entered, MONTH) AS month
)])
WHERE DATE_TRUNC(CURRENT_DATE(), MONTH) = month
AND IF(
EXTRACT(DAY FROM CURRENT_DATE()) < 16,
day BETWEEN 1 AND 15,
day BETWEEN 16 AND 99
)

How to Add Filter to Certain Years Or Between Certain Dates

I am writing a query that counts trips which exceed 20 minutes but is only in the years 2015, 2016, 2017. The code works fine, showing the years and number of trips per year, but the results show all years and not just these three. The issue is that the column start_time is a timestamp. I can do timestamp_add and then between (as shown below, disregard the number of days as they are just placeholders) but it just seems sloppy
I can do timestamp_add and then between (as shown below, disregard the number of days as they are just placeholders) but it just seems sloppy.
SELECT extract(year from start_time), count(*)
FROM `datapb.shopping.trips`
where duration_minutes > 20 and
start_time between timestamp_add(current_timestamp(), interval -1000 DAY) AND timestamp_add(current_timestamp(), interval -350 DAY)
group by EXTRACT(YEAR from start_time)
Any suggestions would be fantastic, thanks!
Why not just use timestamp constants?
select extract(year from start_time) as yyyy, count(*)
from `datapb.shopping.trips`
where duration_minutes > 20 and
start_time >= timestamp('2015-01-01') and
start_time < timestamp('2018-01-01')
group by yyyy
order by yyyy;

Get last week's data from a table with a creation date

Using the following query for retrieving last week data,but I am getting error as
Postgres ERROR: syntax error at or near "CAST" Position: 127
I don't know where the error is:
SELECT count(*), extract(day from createdon) AS period
FROM orders
WHERE servicename =:serviceName AND createdon BETWEEN
CAST(NOW() AS CAST(DATE-EXTRACT(DOW FROM NOW()) AS INTEGER-7)) AND
CAST(NOW() AS CAST(DATE-EXTRACT(DOW from NOW()) AS INTEGER))
GROUP BY extract(day from createdon)
ORDER BY extract(day from createdon);
You are overcomplicating things. To get last week's data, just get everything after the "start of this week" minus 7 days:
The "start of the this week" can be evaluated using date_trunc('week', current_date).
If you subtract 7 days you get the start of the previous week: date_trunc('week', current_date) - interval '7' day. If you subtract 1 day, you get the end of the previous week.
date_trunc always uses Monday as the start of the week, so if your week starts on Sunday, just subract one more, e.g. date_trunc('week', current_date)::date - 8 will be the Sunday of the previous week
Putting that all together you get:
SELECT count(*), extract(day from createdon) AS period
FROM orders
WHERE servicename =:serviceName
AND createdon
between date_trunc('week', current_date)::date - 7
and date_trunc('week', current_date)::date - 1
GROUP BY extract(day from createdon)
ORDER BY extract(day from createdon);
If your columns are timestamp columns you can simply cast createdon to a date to get rid of the time part:
AND createdon::date
between date_trunc('week', current_date)::date - 7
and date_trunc('week', current_date)::date
Note that a regular index on createdon will not be used for that condition, you would need to create an index on createdon::date if you need the performance.
If you can't (or don't want to) create such an index, you need to use something different then between
AND createdon >= date_trunc('week', current_date)::date - 7
AND createdon < date_trunc('week', current_date)::date
(Note the use of < instead of <= which is what `between is using)
Another option is to convert the date information to a combination of week and year:
AND to_char(createdon, 'iyyy-iw') = to_char(date_trunc('week', current_date)::date - 7, 'iyyy-iw')
Note, that I used the ISO week definition for the above. If you are using a different week numbering system, you need a different format mask for the to_char() function.
If you work with the North American week system (whose weeks start on Sunday), your original approach was good enough, just use the correct syntax of CAST(<epr> AS <type>):
SELECT COUNT(*),
EXTRACT(DAY FROM createdon) period
FROM orders
WHERE servicename = 'Cell Tower Monitoring'
AND createdon BETWEEN CURRENT_DATE - CAST(EXTRACT(DOW FROM CURRENT_DATE) AS INTEGER) - 7
AND CURRENT_DATE - CAST(EXTRACT(DOW FROM CURRENT_DATE) AS INTEGER) - 1
GROUP BY EXTRACT(DAY FROM createdon)
ORDER BY EXTRACT(DAY FROM createdon);
Note: this assumes that createdon is a DATE column. If it's a TIMESTAMP (or TIMESTAMP WITH TIME ZONE), you need a slightly different version:
SELECT COUNT(*),
EXTRACT(DAY FROM createdon) period
FROM orders
WHERE servicename = 'Cell Tower Monitoring'
AND createdon >= CURRENT_TIMESTAMP - INTERVAL '1 day' * (EXTRACT(DOW FROM CURRENT_TIMESTAMP) + 7)
AND createdon < CURRENT_TIMESTAMP - INTERVAL '1 day' * EXTRACT(DOW FROM CURRENT_TIMESTAMP)
GROUP BY EXTRACT(DAY FROM createdon)
ORDER BY EXTRACT(DAY FROM createdon);
If you want to use the ISO week system (whose weeks start on Monday), then just use ISODOW instead of DOW. Or, you could use the date_trunc('week', ...) function, like in #a_horse_with_no_name's answer.
If you want to use another week systems (f.ex. which starts on Saturday), you'll need some extra logic inside CASE expressions, as subtracting 1 from DOW will not give the expected results at the start of that kind of week (f.ex. on Saturday it would give the week 2 weeks before).

How many records created for each day of the week this year?

I have about 50k rows in a Postgres database that are users and when they signed up.
I am trying to understand how many users sign up for each day of the week since the start of the year, e.g.:
1238 on Monday
3487 on Tuesday
1237 on Wednesday
Example date entry: '2014-10-31 17:17:30.138579'
A plain aggregate query after extracting the weekday. You could use to_char() to get the (English by default) weekday:
SELECT to_char(created_at, 'Day'), count(*) AS ct
FROM tbl
WHERE created_at >= date_trunc('year', now())
GROUP BY 1;
If performance is important, EXTRACT() is slightly faster:
SELECT EXTRACT(ISODOW FROM created_at), count(*) AS ct
FROM tbl
WHERE created_at >= date_trunc('year', now())
GROUP BY 1;
1 .. Monday, ... , 7 .. Sunday.
You can use the EXTRACT(DOW from timestamp) to determined the day of the week. 0 is Sunday. 6 is Saturday.
Example:
SELECT EXTRACT(DOW FROM TIMESTAMP '2015-06-22 20:38:40');
Result is 1 (Monday)