Get last week's data from a table with a creation date - sql

Using the following query for retrieving last week data,but I am getting error as
Postgres ERROR: syntax error at or near "CAST" Position: 127
I don't know where the error is:
SELECT count(*), extract(day from createdon) AS period
FROM orders
WHERE servicename =:serviceName AND createdon BETWEEN
CAST(NOW() AS CAST(DATE-EXTRACT(DOW FROM NOW()) AS INTEGER-7)) AND
CAST(NOW() AS CAST(DATE-EXTRACT(DOW from NOW()) AS INTEGER))
GROUP BY extract(day from createdon)
ORDER BY extract(day from createdon);

You are overcomplicating things. To get last week's data, just get everything after the "start of this week" minus 7 days:
The "start of the this week" can be evaluated using date_trunc('week', current_date).
If you subtract 7 days you get the start of the previous week: date_trunc('week', current_date) - interval '7' day. If you subtract 1 day, you get the end of the previous week.
date_trunc always uses Monday as the start of the week, so if your week starts on Sunday, just subract one more, e.g. date_trunc('week', current_date)::date - 8 will be the Sunday of the previous week
Putting that all together you get:
SELECT count(*), extract(day from createdon) AS period
FROM orders
WHERE servicename =:serviceName
AND createdon
between date_trunc('week', current_date)::date - 7
and date_trunc('week', current_date)::date - 1
GROUP BY extract(day from createdon)
ORDER BY extract(day from createdon);
If your columns are timestamp columns you can simply cast createdon to a date to get rid of the time part:
AND createdon::date
between date_trunc('week', current_date)::date - 7
and date_trunc('week', current_date)::date
Note that a regular index on createdon will not be used for that condition, you would need to create an index on createdon::date if you need the performance.
If you can't (or don't want to) create such an index, you need to use something different then between
AND createdon >= date_trunc('week', current_date)::date - 7
AND createdon < date_trunc('week', current_date)::date
(Note the use of < instead of <= which is what `between is using)
Another option is to convert the date information to a combination of week and year:
AND to_char(createdon, 'iyyy-iw') = to_char(date_trunc('week', current_date)::date - 7, 'iyyy-iw')
Note, that I used the ISO week definition for the above. If you are using a different week numbering system, you need a different format mask for the to_char() function.

If you work with the North American week system (whose weeks start on Sunday), your original approach was good enough, just use the correct syntax of CAST(<epr> AS <type>):
SELECT COUNT(*),
EXTRACT(DAY FROM createdon) period
FROM orders
WHERE servicename = 'Cell Tower Monitoring'
AND createdon BETWEEN CURRENT_DATE - CAST(EXTRACT(DOW FROM CURRENT_DATE) AS INTEGER) - 7
AND CURRENT_DATE - CAST(EXTRACT(DOW FROM CURRENT_DATE) AS INTEGER) - 1
GROUP BY EXTRACT(DAY FROM createdon)
ORDER BY EXTRACT(DAY FROM createdon);
Note: this assumes that createdon is a DATE column. If it's a TIMESTAMP (or TIMESTAMP WITH TIME ZONE), you need a slightly different version:
SELECT COUNT(*),
EXTRACT(DAY FROM createdon) period
FROM orders
WHERE servicename = 'Cell Tower Monitoring'
AND createdon >= CURRENT_TIMESTAMP - INTERVAL '1 day' * (EXTRACT(DOW FROM CURRENT_TIMESTAMP) + 7)
AND createdon < CURRENT_TIMESTAMP - INTERVAL '1 day' * EXTRACT(DOW FROM CURRENT_TIMESTAMP)
GROUP BY EXTRACT(DAY FROM createdon)
ORDER BY EXTRACT(DAY FROM createdon);
If you want to use the ISO week system (whose weeks start on Monday), then just use ISODOW instead of DOW. Or, you could use the date_trunc('week', ...) function, like in #a_horse_with_no_name's answer.
If you want to use another week systems (f.ex. which starts on Saturday), you'll need some extra logic inside CASE expressions, as subtracting 1 from DOW will not give the expected results at the start of that kind of week (f.ex. on Saturday it would give the week 2 weeks before).

Related

Can you use logical operators in BigQuery to select different dates depending on current date?

At my work we run a report a couple times a week to pull some information from BigQuery.
We run the report every Monday and Thursday.
I'd like to automate the report to run on these days and want to know if I can put in some logic so that if I run the report on a Monday, it runs the data for the previous business week (Sunday - Saturday), and if I run the report on a Thursday, it runs the report for the current business week so far (Sunday - Wednesday).
On another report where I only run the report for previous week I use:
select last_day(current_date - 14, week(monday)) as lw_week_start, last_day(current_date - 7, week(sunday)) as lw_week_end
And to get the current week dates I can use:
select last_day (current_date -7, week(monday)), (current_date -1)
So can I put both of these in my query, and use some sort of logic to say, if I run on a Monday use the first one, if I run on a Thursday, use the second one?
Thanks
You can define the period as a CTE (or if you prefer as variables) and then use that information in the query:
with period as (
select (case when extract(dayofweek from current_date) = 2
then last_day(date_add(current_date, interval -14 day), week(monday))
when extract(dayofweek from current_date) = 5
then last_day(date_add(current_date, interval -7 day), week(monday))
end) as lw_week_start,
(case when extract(dayofweek from current_date) = 2
then last_day(date_add(current_date, interval -7 day), week(sunday)
when extract(dayofweek from current_date) = 5
then date_add(current_date, interval -1 day)
end) as lw_week_end
)
select . . .
from period cross join
. . .
Notes:
This only includes Mondays and Thursdays. I imagine you want to extend this to the other days of the week.
current_date is the current date UTC. You might want to include your timezone:
select current_date('America/New_York')
/* if current_date is monday then it will return previous week report
else it will give report for present week for any other current_date */
IF (EXTRACT (DAYOFWEEK FROM CURRENT_DATE)) = 2 THEN
select last_day(current_date - 14, week(monday)) as lw_week_start, last_day(current_date - 7, week(sunday)) as lw_week_end;
ELSE
select last_day (current_date -7, week(monday)) as week_start, (current_date -1) as previous_day ;
END IF
Scripting on BigQuery
BigQuery Date fuctions
Simply add below to your where clause
where your_date_column in unnest(
case extract(dayofweek from current_date())
when 2 then generate_date_array(last_day(current_date() - 14, week(monday)), last_day(current_date() - 7, week(sunday)))
when 5 then generate_date_array(last_day (current_date() - 7, week(monday)), current_date() - 1)
end
)

Query filtering by week

How to query (SELECT) in Postgresql, so that the results of a column with different dates, are between Sunday and Saturday of the current week.
Query fake example:
SELECT * FROM table WHERE datecolumn BETWEEN CURRENT WEEK
In another query, I have the number of the week in the year. How to make a SELECT for these dates, applying in the WHERE clause the specific week number in the specific year.
Query fake example:
SELECT * FROM table WHERE datecolumn BETWEEN WEEK15 FROM year 2020
Perhaps you can use something like this:
SELECT *
FROM table
WHERE
EXTRACT(week FROM datecolumn) = EXTRACT(week FROM NOW())
AND
EXTRACT(isoyear FROM datecolumn) = EXTRACT(isoyear FROM NOW())
The week is ISO-8601 week number. By definition, ISO weeks start on Mondays and the first week of a year contains January 4 of that year. In other words, the first Thursday of a year is in week 1 of that year.
In the ISO week-numbering system, it is possible for early-January dates to be part of the 52nd or 53rd week of the previous year, and for late-December dates to be part of the first week of the next year.
For example, 2005-01-01 is part of the 53rd week of year 2004, and 2006-01-01 is part of the 52nd week of year 2005, while 2012-12-31 is part of the first week of 2013.
It's recommended to use the isoyear field together with week to get consistent results.
If you need custom (non-ISO) week numbering - you will have to craft your own calculation.
I would recommend the following pair of conditions:
where
date_column >= current_date - extract(dow from current_date) * interval '1 day'
and date_column < current_date - (extract(dow from current_date) - 8) * interval '1 day'
Postgres' date_trunc(week, ...) starts weeks on Monday, so we need something a little more complicated, using extract(dow from ...), which returns 0 on Sundays.
The advantage of this approach is that it is SARGeable, since no function is applied to the column being filtered. This means that this would happily take advantage of an index on the date column.
I would use date_trunc(), but like this. For the current week:
where datecolumn >= date_trunc('week', now()) and
datecolumn < date_trunc('week', now()) + interval '1 week'
For the nth week of the year, this is trickier. I think this does what you want:
where datecolumn >= (date_trunc('week', now()) -
(extract(week from now()) - 1) * interval '1 week' +
<n> * interval '1 week'
) and
datecolumn < (date_trunc('week', now()) -
(extract(week from now()) - 1) * interval '1 week' +
(<n> + 1) * interval '1 week'
)
Both of these are structured so the computations are NOT on the columns, so they are compatible with using indexes.

Count of rides in each week over the last 12 weeks

Assume that we have the following tables, with columns as indicated:
Rides
ride_id
start_time
end_time
passenger_id
driver_id
ride_region
is_completed (Y/N)
Drivers
driver_id
onboarding_time
home_region
Write a query that we could use to create a plot of the total count of rides completed in our San Francisco region, for each week over the last 12 weeks.
I have used datepart to get the count for every week. But I am not sure how to include the clause which outputs last 12 weeks from TODAY. My code will give a count for week 1 to 12 from the earliest start time.
Please check my code and correct me.
SELECT datepart(week, START_TIME), COUNT(RIDE_ID)
FROM RIDES
WHERE is completed = 'Y' AND ride_region ='San Francisco' AND
datepart(week, START_TIME) <= 12
group by `datepart(week, START_TIME)`;
I expect count output for last 12 weeks based on week.
Instead of:
AND datepart(week, START_TIME) <= 12
use this
AND START_TIME > current_date - interval '84 day'
because you want all the rows from the last 12 weeks = 84 days
and group by datepart(week, START_TIME)
If you want the last 12 weeks from current_date
SELECT datepart(week, START_TIME), COUNT(RIDE_ID)
FROM RIDES
WHERE is completed = 'Y'
AND ride_region ='San Francisco'
AND datepart(week, START_TIME) between (date_trunc('week', current_date) -12)
AND date_trunc('week', current_date)
group by datepart(week, START_TIME);
This is a bit complicated, because you probably don't want partial weeks. So, subtracting 12 weeks (or 84 days) may not be sufficient.
I would recommend logic more like this:
where start_time >= date_trunc('week', curdate()) - interval '12 week') and
start_time < date_trunc('week', curdate())
This gives the last 12 full weeks of data, based on calendar weeks.
You have already decided to use the canonical definition of week, so this makes sense. Alternatively, you could have the definition of week starting on any day of the week.
Select ride_id, start_time, end_time, datediff(week,start_time,end_time) as [WEEKLYRIDE],
Count(ride_id) AS (TOTALRIDES)
From rides
Where ride_region = ‘San Francisco’
And ride_completed = ‘Y’
And datediff(wk,start_time, end_time) BETWEEN 1 and 12
GROUP BY datediff(wk,start_time, end_time)

Select data with a rolling date criteria

The below query returns a distinct count of 'members' for a given month and brand (see image below).
select to_char(transaction_date, 'YYYY-MM') as month, brand,
count(distinct UNIQUE_MEM_ID) as distinct_count
from source.table
group by to_char(transaction_date, 'YYYY-MM'), brand;
The data is collected with a 15 day lag after the month closes (meaning September 2016 MONTHLY data won't be 100% until October 15). I am only concerned with monthly data.
The query I would like to build: Until the 15th of this month (October), last month's data (September) should reflect August's data. The current partial month (October) should default to the prior month and thus also to the above logic.
After the 15th of this month, last month's data (September) is now 100% and thus September should reflect September (and October will reflect September until November 15th, and so on).
The current partial month will always = the prior month. The complexity of the query is how to calc prior month.
This query will be ran on a rolling basis so needs to be dynamic.
To be clear, I am trying to build a query where distinct_count for the prior month (until end of current month + 15 days) should reflect (current month - 2) value (for each respective brand). After 15 days of the close of the month, prior month = (current month - 1).
Partial current month defaults to prior month's data. The 15 day value should be variable/modifiable.
First, simplify the query to:
select to_char(transaction_date, 'YYYY-MM') as month, brand,
count(distinct members) as distinct_count
from source.table
group by members, to_char(transaction_date, 'YYYY-MM'), brand;
Then, you are going to have a problem. The problem is that one row (say from Aug 20th) needs to go into two groups. A simple group by won't handle this. So, let's use union all. I think the result is something like this:
select date_trunc('month', transaction_date) as month, brand,
count(distinct members) as distinct_count
from source.table
where (date_trunc('month', transaction_date) < date_trunc('month' current_date) - interval '1 month') or
(day(current_date) > 15 and date_trunc('month', transaction_date) = date_trunc('month' current_date) - interval '1 month')
group by date_trunc('month', transaction_date), brand
union all
select date_trunc('month' current_date) - interval '1 month' as month, brand,
count(distinct members) as distinct_count
from source.table
where (day(current_date) < 15 and date_trunc('month', transaction_date) = date_trunc('month' current_date) - interval '1 month')
group by brand;
Since you already have a working query, I concentrate on the subselect. The condition you can use here is CASE, especially "Searched CASE"
case
when extract(day from current_date) < 15 then
extract(month from current_date - interval '2 months')
else
extract(month from current_date - interval '1 month')
end case
This may be used as part of a where clause, for example.
Here is some sudo code to get the begin date and the end date for your interval.
Begin date:
date DATE_TRUNC('month', CURRENT_DATE - integer 15) - interval '1 month'
This will return the current month only after the 15th day, from there you can subtract a full month to get your starting point.
End Date:
To calculate this, grab the begin date, plus a month, minus a day.
If the source table is partitioned by transaction_date, this syntax (not masking transaction_date with expression) enables partitions eliminatation.
select to_char(transaction_date, 'YYYY-MM') as month
,count (distinct members) as distinct_count
,brand as brand
FROM source.table
where transaction_date between date_trunc('month', current_date) - case when extract (day from current_date) >= 15 then 1 else 2 end * interval '1' month
and date_trunc('month', current_date) - case when extract (day from current_date) >= 15 then 0 else 1 end * interval '1' month - interval '1' day
group by to_char(transaction_date, 'YYYY-MM')
,brand
;

How do I determine the last day of the previous month using PostgreSQL?

I need to query a PostgreSQL database to determine records that fall within today's date and the last day of the previous month. In other words, I'd like to retrieve everything that falls between December 31, 2011 and today. This query will be re-used each month, so next month, the query will be based upon the current date and January 31, 2012.
I've seen this option, but I'd prefer to avoid using a function (if possible).
Both solutions include the last day of the previous month and also include all of "today".
For a date column:
SELECT *
FROM tbl
WHERE my_date BETWEEN date_trunc('month', now())::date - 1
AND now()::date
You can subtract plain integer values from a date (but not from a timestamp) to subtract days. This is the simplest and fastest way.
For a timestamp column:
SELECT *
FROM tbl
WHERE my_timestamp >= date_trunc('month', now()) - interval '1 day'
AND my_timestamp < date_trunc('day' , now()) + interval '1 day'
I use the < operator for the second condition to get precise results (read: "before tomorrow").
I do not cast to date in the second query. Instead I add an interval '1 day', to avoid casting back and forth.
Have a look at date / time types and functions in the manual.
For getting date of previous/last month:
SELECT (date_trunc('month', now())::date - 1) as last_month_date
Result: 2012-11-30
For getting number of days of previous/last month:
SELECT DATE_PART('days', date_trunc('month', now())::date - 1) last_month_days
Result: 30
Try this:
SELECT ...
WHERE date_field between (date_trunc('MONTH', now()) - INTERVAL '1 day')::date
and now()::date
...
Try
select current_date - cast((date_part('day', current_date) + 1) as int)
take from http://wiki.postgresql.org/wiki/Date_LastDay, and modified to return just the days in a month
CREATE OR REPLACE FUNCTION calc_days_in_month(date)
RETURNS double precision AS
$$
SELECT EXTRACT(DAY FROM (date_trunc('MONTH', $1) + INTERVAL '1 MONTH - 1 day')::date);
$$ LANGUAGE 'sql' IMMUTABLE STRICT;
select calc_days_in_month('1999-05-01')
returns 31
Reference is taken from this blog:
You can use below function:
CREATE OR REPLACE FUNCTION fn_GetLastDayOfMonth(DATE)
RETURNS DATE AS
$$
SELECT (date_trunc('MONTH', $1) + INTERVAL '1 MONTH - 1 day')::DATE;
$$ LANGUAGE 'sql'
IMMUTABLE STRICT;
Sample executions:
SELECT *FROM fn_GetLastDayOfMonth(NOW()::DATE);