Build time window counters from raw data - Big Query - sql

Consider raw events data regarding purchases in 2020, as per the following table:
BUYER DATE ITEM
Joe '2020-01-15' Dr. Pepper
Joe '2020-02-15' Dr. Pepper
Joe '2020-03-15' Dr. Pepper
Joe '2020-05-15' Dr. Pepper
Joe '2020-10-15' Dr. Pepper
Joe '2020-12-15' Dr. Pepper
I would like to aggregate the data to see what Joe did in a monthly moving sum, i.e., obtaining as an outcome
BUYER Date Num_Purchases_last_3months
Joe '2020-01-31' 1
Joe '2020-02-31' 2
Joe '2020-03-31' 3
Joe '2020-04-31' 2
.
.
.
Joe '2020-11-31' 1
Joe '2020-12-31' 2
How could I obtain the desired result in an efficient query?

You can use window functions, in this case, count(*) with a range window frame specification:
select t.*,
count(*) over (partition by buyer
order by extract(year from date) * 12 + extract(month from date)
range between 2 preceding and current row
) as Num_Purchases_last_3months
from t;

Related

How do I select a max date by person in a table

I am not too advanced with SSRS/SQL queries, and need to write a report that pulls out % allocations by person to then compare to a wage table to allocate the wages. These allocations change quarterly, but all allocations continue to be stored in the table. If a persons allocation did not change, they do NOT get a new entry in the table. Here is a sample table called Allocations.
First Name
Last Name
Date
Area
Percent
Smith
Bob
01/01/20
A
50.00
Smith
Bob
01/01/20
B
50.00
Doe
Jane
01/01/20
A
25.00
Doe
Jane
01/01/20
B
25.00
Doe
Jane
01/01/20
C
50.00
Doe
Jane
04/01/20
A
35.00
Doe
Jane
04/01/20
C
65.00
Wayne
Bruce
01/01/20
A
100.00
Wayne
Bruce
04/01/20
B
100.00
The results that I would want to have from this sample table when querying it are:
First Name
Last Name
Date
Area
Percent
Smith
Bob
01/01/20
A
50.00
Smith
Bob
01/01/20
B
50.00
Doe
Jane
04/01/20
A
35.00
Doe
Jane
04/01/20
C
65.00
Wayne
Bruce
04/01/20
B
100.00
However, I would also like to pull this by comparing it to a date that the user inputs, so that they could run this report at any point in time and get the correct "max" dates. So, for example, if there were also 7/1/20 dates in here, but the user input date was 6/30/20, I would NOT want to pull the 7/1/20 data. In other words, I would like to pull the rows with the maximum date by name w/o going over the user's input date.
Any idea on the best way to accomplish this?
Thanks in advance for any advice you can provide.
In SQL, ROW_NUMBER can be used to order records in groups by a particular field.
SELECT * FROM (
SELECT *, ROW_NUMBER()OVER(PARTITION BY Last_Name, First_Name ORDER BY DATE DESC) as ROW_NUM
FROM TABLE
) AS T
WHERE ROW_NUM = 1
Then you filter for ROW_NUM = 1.
However, I noticed that there are a couple with the same date and you want both. In this caseyou'd want to use RANK - which allows for ties so there may be multiple records with the same date that you want to capture.
SELECT * FROM (
SELECT *, RANK()OVER(PARTITION BY Last_Name, First_Name ORDER BY DATE DESC) as ROW_NUM
FROM TABLE
) AS T
WHERE ROW_NUM = 1

Find intersecting dates

Can somebody help me with next problem. I have MS Access table, lets say with my employees, and for each one of them I have start and end date of their vacation:
Name begin end
John 1.3.2021. 15.3.2021.
Robert 6.3.2021. 8.3.2021.
Lisa 13.3.2021. 16.3.2021.
John 1.4.2021. 3.4.2021.
Robert 2.4.2021. 2.4.2021.
Lisa 15.5.2021. 23.5.2021.
Lisa 5.6.2021. 15.6.2021.
How to get the result with number of employees which are absent from work per each date from the table (dates which are included into intervals begin-end). For example:
1.3.2021. 1 '>>>only John
2.3.2021. 1 '>>>only John
3.3.2021. 1 '>>>only John
4.3.2021. 1 '>>>only John
5.3.2021. 1 '>>>only John
6.3.2021. 2 '>>>John and Robert
7.3.2021. 2 '>>>John and Robert
...
Thank you in advanced!
You can use union to combine the tables and a correlated subquery:
select dte,
(select count(*)
from t
where d.dte between t.[begin] and t.[end]
) as cnt
from (select [begin] as dte
from t
union
select [end]
from t
) d;

SQL Server query to 'ftatten' data for reporting

Say I have a table with the following data, in the following structure. I'm trying to query the data to find the date ranges that someone (employee) worked.
NAME WORKED DATE
Bob YES 1/1/2019
Bob YES 1/2/2019
Bob YES 1/3/2019
Bob NO 1/4/2019
Bob YES 1/5/2019
Bob YES 1/6/2019
Bob NO 1/7/2019
Jane Yes 1/1/2019
Jane Yes 1/2/2019
Jame No 1/3/2019
Expected Result: (The Result I need)
Bob 1/1/2019 - 1/3/2019
Bob 1/5/2019 - 1/6/2019
Jane 1/1/2019 - 1/2/2019
What's the SQL syntax (SQL Server 2008+) of the query to return this result set?
thx in advance
This is a gaps-and-islands problem. You can identify the rows using row_number() and some date arithmetic.
So, assuming you have a row for every date:
select name, min(date), max(date)
from (select t.*,
row_number() over (partition by name order by date) as seqnum
from t
where worked = 'YES'
) t
group by name,
dateadd(day, - seqnum, date);
Why does this work? You are looking for adjacent dates. If you subtract a sequence from the dates, then the result is constant -- when the dates are sequential. This observation is used in the group by to get the groups you want.

How to have the rolling distinct count of each day for past three days in Oracle SQL?

I searched for this a lot, but I couldn't find the solution yet. let me explain my question by sample data and my desired output.
sample data:
datetime customer
---------- --------
2018-10-21 09:00 Ryan
2018-10-21 10:00 Sarah
2018-10-21 20:00 Sarah
2018-10-22 09:00 Peter
2018-10-22 10:00 Andy
2018-10-23 09:00 Sarah
2018-10-23 10:00 Peter
2018-10-24 10:00 Andy
2018-10-24 20:00 Andy
my desired output is to have the distinctive number of customers for past three days relative to each day:
trunc(datetime) progressive count distinct customer
--------------- -----------------------------------
2018-10-21 2
2018-10-22 4
2018-10-23 4
2018-10-24 3
explanation: for 21th, because we have only Ryan and Sarah the count is 2 (also because we have no other records before 21th); for 22th Andy and Peter are added to the distinct list, so it's 4. for 23th, no new customer is added so it would be 4. for 24th, however, as we only should consider past 3 days (as per business logic), we should only take 24th,23th and 22th; so the distinct customers would be Sarah, Andy and Peter. so the count is 3.
I believe it is called the progressive count, or moving count or rolling up count. but I couldn't implement it in Oracle 11g SQL. Obviously it's easy by using PL-SQL programming (Stored-Procedure/Function). but, preferably I wonder if we can have it by a single SQL query.
What you seem to want is:
select date,
count(distinct customer) over (order by date rows between 2 preceding and current row)
from (select distinct trunc(datetime) as date, customer
from t
) t
group by date;
However, Oracle does not support window frames with count(distinct).
One rather brute force approach is a correlated subquery:
select date,
(select count(distinct t2.customer)
from t t2
where t2.datetime >= t.date - 2
) as running_3
from (select distinct trunc(datetime) as date
from t
) t;
This should have reasonable performance for a small number of dates. As the number of dates increases, the performance will degrade linearly.

SQL - Spread a value across multiple weeks

I currently have data stored as follows:
PERSON DATE RATE
----------------------------
John Smith 1/4/2012 1.2
John Smith 8/6/2012 1.7
John Smith 8/13/2012 1.9
John Smith 8/20/2012 2
John Smith 9/10/2012 1.8
John Smith 10/1/2012 3
I'm trying to output a rate for each week (ending Sunday) of the year for each person. Where the rate doesn't exist for a given week, the previous week is used i.e.:
PERSON WEEK RATE
----------------------------
John Smith 1/8/2012 1.2
John Smith 1/15/2012 1.2
John Smith 1/22/2012 1.2
John Smith 1/29/2012 1.2
etc
I can build a table of for date and week combinations so I can determine the week. How can I duplicate the rate though? An outer join or something similar?
There's a lot I don't know about your data, but here's a suggestion. The first subquery selects all distinct persons, the second matches each of these to each week (supposing you store weeks as date and 00:00 am Sunday morning). This supposes that Person is unique, but I hope you have a primary key that can be used to determine who's who.
SELECT
X.Person,
W.Date,
X.Rate
FROM
(
SELECT
Person
FROM
Ratings
GROUP BY
Person
) P
CROSS JOIN
(
SELECT
Date AS DateEnd
FROM
Weeks
) W
CROSS APPLY
( -- Last Rate before week end
SELECT TOP 1
Rate
FROM
Ratings
WHERE
Person = P.Person AND
Date < Dateadd(DAY,1,W.DateEnd)
ORDER BY
DATE DESC
) X