SQL Count Consecutive Days per line - sql

Looking to get the "Total Consecutive Days" per row.
Shift Table Contains ShiftId, ClientID, ServiceId and ProviderID
ShiftDetails Table has the dates.
There can be multiple Shifts with the same ShiftId, ClientID, ServiceId
There can even be duplicate dates.
Would like to have multiple "Total Consecutive Days" calculations.
Consecutive Days Per ClientID
Consecutive Days Per ClientID and ServiceID
Consecutive Days Per ClientID , ServiceID and ProviderID.
SQL Fiddle

This is a gaps-and-islands problem. You can identify the islands by subtracting a sequence of integers from the date -- constant differences identify adjacent dates. Then use count(*) as a window function:
select t.*,
count(*) over (partition by clientid, serviceid, dateadd(day, -seqnum, date)) as consecutive_days
from (select t.*,
row_number() over (partition by clientid, serviceid order by date) as seqnum
from t
) t

Related

Rolling 3 day average transaction amount for each day

I'm trying to get the rolling 3 day average transaction amount for each day. I first grouped my data by day from the time stamp using cast:
select
cast(transaction_time as Date) As Date
, SUM(transaction_amount) as total_transaction_amount
from transactions
Group by cast(transaction_time as date)
order by cast(transaction_time as date)
now I want to get the rolling 3 day average:
select *,
avg(transaction_amount) OVER(ORDER BY transaction_time
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW)
as moving_average
from transactions;
but don't know how to make both statements work together, any ideas?
You've basically done all the hard work, just need to stick them together and a CTE is great for this.
With transactions_by_day as(
select
cast(transaction_time as Date) As Date
, SUM(transaction_amount) as total_transaction_amount
from transactions
Group by cast(transaction_time as date)
order by cast(transaction_time as date))
select *,
avg(total_transaction_amount) OVER(ORDER BY date
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW)
as moving_average
from transactions_by_day

SQL - Counting users that have multiple transactions and have at least one transaction that has been made within 7 days interval of the other one

Dataset Here is the task : Count users that have multiple transactions and have at least one transaction that has been made within 7 days interval of the other one.
Structure of dataset: Row, userId, orderId, date
Date is formatted as YYYY-MM-DDTHH:MM:SS Example: 2016-09-16T11:32:06
I have completed the first part (counting users with multiple transactions), but I do not know how to do the second part in the same query. I will be thankful for help.
Here is the console:
query = '''
SELECT COUNT(*)
FROM
(SELECT userId FROM `dataset` GROUP BY userId HAVING COUNT(orderId) > 1)
'''
project_id = 'acdefg'
df = pd.io.gbq.read_gbq(query, project_id=project_id, dialect='standard')
display(df)
To solve this issue you want to be able to compare each record to a previous record: when was the last order from the same user. This hints to the use of partitions and window functions, in this case LAG.
A possible way to solve the problem is to organise records per user and order them by orderDate and then for each record have a look at the record just above:
WITH intermediate_table AS (
SELECT
userId,
orderDate,
LAG(orderDate)
OVER (PARTITION BY userId ORDER BY orderDate) -- this is where we pick the orderDate of the record right above, once the orders are organized by userId and ordered by orderDate
FROM `dataset.table`
)
SELECT userId
FROM intermediate_table
WHERE DATE_DIFF(orderDate, previous_order, DAY) <= 7
GROUP BY userId
Once orderDate and previous_order info are gathered in the same record, it's easy to compare them and see if there is less than 7 days between the two.
(GROUP BY is used for returning userIds only once in the resulting table)
This may be what you need:
-- for each order calculate the days since that customer's last order
order_profiler AS (
SELECT
orderId,
orderDate,
custId,
DATE_DIFF(orderDate, LAG(orderDate) OVER (PARTITION BY custId ORDER BY orderDate), day) AS order_latency_days,
FROM
`dataset.table`
)
SELECT
custId,
FROM order_profiler
WHERE order_latency_days <= 7
GROUP BY custId

Maximum number of consecutive trading holidays from a date/calendar table

I am trying to find the maximum number of consecutive trading holidays from a Trading date/calendar table. I have a flag isTradingHoliday = 1 in the TradingDate table that denotes the dates which are trading holidays, otherwise isTradingHoliday = 0. How to know which date range was the most consecutive trading holidays in that TradingDate table?
This sounds like a gaps-and-islands problem. You can find the first date and the count of days using the difference of row numbers. The rest is aggregation and filtering:
select top (1) with ties min(tradingdate) as startdate,
max(tradingdate) as enddate
from (select c.*,
row_number() over (order by tradingdate) as seqnum,
row_number() over (partition by isTradingHoliday order by tradingdate) as seqnum_h
from calendar c
) c
where isTradingHoliday = 1
group by isTradingHoliday, (seqnum - seqnum_h)
order by count(*) desc

How to take only one entry from a table based on an offset to a date column value

I have a requirement to get values from a table based on an offset conditions on a date column.
Say for eg: for the below attached table, if there is any dates that comes close within 15 days based on effectivedate column I should return only the first one.
So my expected result would be as below:
Here for A1234 policy, it returns 6/18/16 entry and skipped 6/12/16 entry as the offset between these 2 dates is within 15 days and I took the latest one from the list.
If you want to group rows together that are within 15 days of each other, then you have a variant of the gaps-and-islands problem. I would recommend lag() and cumulative sum for this version:
select polno, min(effectivedate), max(expirationdate)
from (select t.*,
sum(case when prev_ed >= dateadd(day, -15, effectivedate)
then 1 else 0
end) over (partition by polno order by effectivedate) as grp
from (select t.*,
lag(expirationdate) over (partition by polno order by effectivedate) as prev_ed
from t
) t
) t
group by polno, grp;

How can I select one row for each week in a date range that spans more than a year?

In my postgreSQL data base, I have a table with columns of dates and prices. ('transdate' and 'price')
I would like to form a query which selects one row for each week over a date range which spans more than one year.
From another question/answer here, I implemented this code which works for date ranges of less than a year:
;with cte as
(
select *,
row_number() over (partition by Extract (week from transdate) order by transdate desc) as rn
from "tablename" where transdate between '06-01-1999' and '06-01-1999'::timestamp + `'50 week'::interval
)
select transdate, price from cte where rn = 1 order by transdate;
However, when I extend the interval greater than 50 weeks, it still only selects a max of 12 months.
How can I re-write this code to select one date/price from every week in the range?
Your problem is that week numbers wrap around at year boundaries but you want to look at the week number and the year at the same time. Lucky for you, you can PARTITION BY several things at once:
row_number() over (
partition by extract(week from transdate),
extract(year from transdate)
order by transdate desc
) as rn