SQL - Spread a value across multiple weeks - sql

I currently have data stored as follows:
PERSON DATE RATE
----------------------------
John Smith 1/4/2012 1.2
John Smith 8/6/2012 1.7
John Smith 8/13/2012 1.9
John Smith 8/20/2012 2
John Smith 9/10/2012 1.8
John Smith 10/1/2012 3
I'm trying to output a rate for each week (ending Sunday) of the year for each person. Where the rate doesn't exist for a given week, the previous week is used i.e.:
PERSON WEEK RATE
----------------------------
John Smith 1/8/2012 1.2
John Smith 1/15/2012 1.2
John Smith 1/22/2012 1.2
John Smith 1/29/2012 1.2
etc
I can build a table of for date and week combinations so I can determine the week. How can I duplicate the rate though? An outer join or something similar?

There's a lot I don't know about your data, but here's a suggestion. The first subquery selects all distinct persons, the second matches each of these to each week (supposing you store weeks as date and 00:00 am Sunday morning). This supposes that Person is unique, but I hope you have a primary key that can be used to determine who's who.
SELECT
X.Person,
W.Date,
X.Rate
FROM
(
SELECT
Person
FROM
Ratings
GROUP BY
Person
) P
CROSS JOIN
(
SELECT
Date AS DateEnd
FROM
Weeks
) W
CROSS APPLY
( -- Last Rate before week end
SELECT TOP 1
Rate
FROM
Ratings
WHERE
Person = P.Person AND
Date < Dateadd(DAY,1,W.DateEnd)
ORDER BY
DATE DESC
) X

Related

How do I select a max date by person in a table

I am not too advanced with SSRS/SQL queries, and need to write a report that pulls out % allocations by person to then compare to a wage table to allocate the wages. These allocations change quarterly, but all allocations continue to be stored in the table. If a persons allocation did not change, they do NOT get a new entry in the table. Here is a sample table called Allocations.
First Name
Last Name
Date
Area
Percent
Smith
Bob
01/01/20
A
50.00
Smith
Bob
01/01/20
B
50.00
Doe
Jane
01/01/20
A
25.00
Doe
Jane
01/01/20
B
25.00
Doe
Jane
01/01/20
C
50.00
Doe
Jane
04/01/20
A
35.00
Doe
Jane
04/01/20
C
65.00
Wayne
Bruce
01/01/20
A
100.00
Wayne
Bruce
04/01/20
B
100.00
The results that I would want to have from this sample table when querying it are:
First Name
Last Name
Date
Area
Percent
Smith
Bob
01/01/20
A
50.00
Smith
Bob
01/01/20
B
50.00
Doe
Jane
04/01/20
A
35.00
Doe
Jane
04/01/20
C
65.00
Wayne
Bruce
04/01/20
B
100.00
However, I would also like to pull this by comparing it to a date that the user inputs, so that they could run this report at any point in time and get the correct "max" dates. So, for example, if there were also 7/1/20 dates in here, but the user input date was 6/30/20, I would NOT want to pull the 7/1/20 data. In other words, I would like to pull the rows with the maximum date by name w/o going over the user's input date.
Any idea on the best way to accomplish this?
Thanks in advance for any advice you can provide.
In SQL, ROW_NUMBER can be used to order records in groups by a particular field.
SELECT * FROM (
SELECT *, ROW_NUMBER()OVER(PARTITION BY Last_Name, First_Name ORDER BY DATE DESC) as ROW_NUM
FROM TABLE
) AS T
WHERE ROW_NUM = 1
Then you filter for ROW_NUM = 1.
However, I noticed that there are a couple with the same date and you want both. In this caseyou'd want to use RANK - which allows for ties so there may be multiple records with the same date that you want to capture.
SELECT * FROM (
SELECT *, RANK()OVER(PARTITION BY Last_Name, First_Name ORDER BY DATE DESC) as ROW_NUM
FROM TABLE
) AS T
WHERE ROW_NUM = 1

Build time window counters from raw data - Big Query

Consider raw events data regarding purchases in 2020, as per the following table:
BUYER DATE ITEM
Joe '2020-01-15' Dr. Pepper
Joe '2020-02-15' Dr. Pepper
Joe '2020-03-15' Dr. Pepper
Joe '2020-05-15' Dr. Pepper
Joe '2020-10-15' Dr. Pepper
Joe '2020-12-15' Dr. Pepper
I would like to aggregate the data to see what Joe did in a monthly moving sum, i.e., obtaining as an outcome
BUYER Date Num_Purchases_last_3months
Joe '2020-01-31' 1
Joe '2020-02-31' 2
Joe '2020-03-31' 3
Joe '2020-04-31' 2
.
.
.
Joe '2020-11-31' 1
Joe '2020-12-31' 2
How could I obtain the desired result in an efficient query?
You can use window functions, in this case, count(*) with a range window frame specification:
select t.*,
count(*) over (partition by buyer
order by extract(year from date) * 12 + extract(month from date)
range between 2 preceding and current row
) as Num_Purchases_last_3months
from t;

Oracle SQL code to list a persons employment events and another 2 columns devoted listing the prior event and date

So I have a database full of people with employment events and I'm trying to build a report in SQL that will pull the following:
Name, employment event, date of employment event, and the employment event that occurred prior to that event, and date of most recent event.
The data is organized so that each event is a row. So if I pull the employment history for a participant named John Smith I would get the output (sorted by date of event desc):
Name Event Date of Event
John Smith Terminated 5/13/2017
John Smith Return from Leave 4/13/2017
John Smith Paid Leave 3/31/2017
John Smith Hire 1/1/2000
My goal is to get the following output:
Name Event Date of Event Prior Event Date of prior event
John Smith Terminated 5/13/2017 Return from Leave 4/13/2017
John Smith Return from Leave 4/13/2017 Paid Leave 3/31/2017
John Smith Paid Leave 3/31/2017 Hire 1/11/2000
John Smith Hire 1/1/2000 NULL NULL
I managed to get a code working that almost does this.
select distinct a.ssn, b.name, a.event, a.event_date,
c.event as prior_event, c.event_date as prior_date
from history a
left join basic_data b
on b.ssn = a.ssn
Left Join
(select distinct c.ssn, c.event_date, c.event
from history c
) c
on c.ssn = a.ssn and (a.event > c.event)
order by a.ssn asc, a.event_date desc
That gives me this output:
Name Event Date of Event Prior Event Date of prior event
John Smith Terminated 5/13/2017 Return from Leave 4/13/2017
John Smith Terminated 5/13/2017 Paid Leave 3/31/2017
John Smith Terminated 5/13/2017 Hire 1/1/2000
John Smith Return from Leave 4/13/2017 Paid Leave 3/31/2017
John Smith Return from Leave 4/13/2017 Hire 1/1/2000
John Smith Paid Leave 3/31/2017 Hire 1/1/2000
John Smith Hire 1/1/2000 NULL NULL
It's showing multiple rows for every event prior to that event instead of just the one before it. How do I get rid of all of the extra rows?
You can use lead() analytic function with order by event_date desc
select h.name as "Name", h.event as "Event", h.event_date as "Date of Event",
lead(h.event) over (order by event_date desc) as "Prior Event",
lead(h.event_date) over (order by event_date desc) as "Date of prior event"
from history h
order by event_date desc;
Demo
If you want the prior events use lag(). Also, partitioning by the employee is very important:
select h.name, h.event, h.event_date,
lag(h.event) over (partition by h.name order by h.event_date) as prior_event,
lag(h.event_date) over (order by event_date) as prior_event-date
from history h
order by h.name, event_date desc;

Is there a way to list the most recent dates for an event based on data in other columns?

I am working to write a query that shows the most recent job start date for each person with extended families with in the past year (I should not show future dates) It is possible that multiple families (in multiple states) may have started their job on the same date. In that case, I need to list the state(s), both people, and the respective dates. However, I should only list each state/person pair once.
Additionally, if the person didn't start their job within the past year, I should still list the persons name but in the place of the state name, I should have the query return NULL and the date return NULL.
Below is the date in the raw table:
LOC FAM PPL MILESTONE_ID MILESTONE_NAME START_DATE
WI Smith Mike 1 End College 9/4/2017 0:00
WI Smith Mike 2 Start Job 9/4/2017 0:00
WI Smith Bob 1 End College 6/4/2019
WI Smith Bob 2 Start Job 6/4/2019
IL Thomas Mike 1 End College 1/4/2019
IL Thomas Mike 2 Start Job 6/4/2019
IL Thomas Bob 1 End College 12/4/2019
IL Thomas Bob 2 Start Job 6/4/2019
I know that I need to use a subquery to get the most recent job start dates but my subquery isn't behaving as expected. I have also tried using a CTE but that isn't working either.
This is what I have so far. I haven't gotten the subquery to work correctly. I still need to add the NULL portion of the situation above
Select family.*
From
FAMILY.KEYINFO as family
Inner Join
(Select family.milestone_id, MAX(family.start_date) as LatestDate
from FAMILY.keyinfo
group by milestone_id) groupeddate
on family.milestone_id=groupeddate.milestone
where family.start_date<= CURRENT_TIMESTAMP
and family.start_date > DATEADD(year,-1,GETDATE())
Below is what I would expect the answer to be if the query was correct:
LOC PPL START_DATE
N/A Mike N/A
N/A Mike N/A
WI Bob 6/4/2019
IL Mike 6/4/2019
IL Bob 6/4/2019
You seem to want window functions:
select f.*
from (select f.*,
rank() over (partition by fam order by start_date desc as seqnum
from families f
where milestone_name = 'Start Job'
) f
where seqnum = 1;

How to have the rolling distinct count of each day for past three days in Oracle SQL?

I searched for this a lot, but I couldn't find the solution yet. let me explain my question by sample data and my desired output.
sample data:
datetime customer
---------- --------
2018-10-21 09:00 Ryan
2018-10-21 10:00 Sarah
2018-10-21 20:00 Sarah
2018-10-22 09:00 Peter
2018-10-22 10:00 Andy
2018-10-23 09:00 Sarah
2018-10-23 10:00 Peter
2018-10-24 10:00 Andy
2018-10-24 20:00 Andy
my desired output is to have the distinctive number of customers for past three days relative to each day:
trunc(datetime) progressive count distinct customer
--------------- -----------------------------------
2018-10-21 2
2018-10-22 4
2018-10-23 4
2018-10-24 3
explanation: for 21th, because we have only Ryan and Sarah the count is 2 (also because we have no other records before 21th); for 22th Andy and Peter are added to the distinct list, so it's 4. for 23th, no new customer is added so it would be 4. for 24th, however, as we only should consider past 3 days (as per business logic), we should only take 24th,23th and 22th; so the distinct customers would be Sarah, Andy and Peter. so the count is 3.
I believe it is called the progressive count, or moving count or rolling up count. but I couldn't implement it in Oracle 11g SQL. Obviously it's easy by using PL-SQL programming (Stored-Procedure/Function). but, preferably I wonder if we can have it by a single SQL query.
What you seem to want is:
select date,
count(distinct customer) over (order by date rows between 2 preceding and current row)
from (select distinct trunc(datetime) as date, customer
from t
) t
group by date;
However, Oracle does not support window frames with count(distinct).
One rather brute force approach is a correlated subquery:
select date,
(select count(distinct t2.customer)
from t t2
where t2.datetime >= t.date - 2
) as running_3
from (select distinct trunc(datetime) as date
from t
) t;
This should have reasonable performance for a small number of dates. As the number of dates increases, the performance will degrade linearly.