SQL Server query to 'ftatten' data for reporting - sql

Say I have a table with the following data, in the following structure. I'm trying to query the data to find the date ranges that someone (employee) worked.
NAME WORKED DATE
Bob YES 1/1/2019
Bob YES 1/2/2019
Bob YES 1/3/2019
Bob NO 1/4/2019
Bob YES 1/5/2019
Bob YES 1/6/2019
Bob NO 1/7/2019
Jane Yes 1/1/2019
Jane Yes 1/2/2019
Jame No 1/3/2019
Expected Result: (The Result I need)
Bob 1/1/2019 - 1/3/2019
Bob 1/5/2019 - 1/6/2019
Jane 1/1/2019 - 1/2/2019
What's the SQL syntax (SQL Server 2008+) of the query to return this result set?
thx in advance

This is a gaps-and-islands problem. You can identify the rows using row_number() and some date arithmetic.
So, assuming you have a row for every date:
select name, min(date), max(date)
from (select t.*,
row_number() over (partition by name order by date) as seqnum
from t
where worked = 'YES'
) t
group by name,
dateadd(day, - seqnum, date);
Why does this work? You are looking for adjacent dates. If you subtract a sequence from the dates, then the result is constant -- when the dates are sequential. This observation is used in the group by to get the groups you want.

Related

How do I select a max date by person in a table

I am not too advanced with SSRS/SQL queries, and need to write a report that pulls out % allocations by person to then compare to a wage table to allocate the wages. These allocations change quarterly, but all allocations continue to be stored in the table. If a persons allocation did not change, they do NOT get a new entry in the table. Here is a sample table called Allocations.
First Name
Last Name
Date
Area
Percent
Smith
Bob
01/01/20
A
50.00
Smith
Bob
01/01/20
B
50.00
Doe
Jane
01/01/20
A
25.00
Doe
Jane
01/01/20
B
25.00
Doe
Jane
01/01/20
C
50.00
Doe
Jane
04/01/20
A
35.00
Doe
Jane
04/01/20
C
65.00
Wayne
Bruce
01/01/20
A
100.00
Wayne
Bruce
04/01/20
B
100.00
The results that I would want to have from this sample table when querying it are:
First Name
Last Name
Date
Area
Percent
Smith
Bob
01/01/20
A
50.00
Smith
Bob
01/01/20
B
50.00
Doe
Jane
04/01/20
A
35.00
Doe
Jane
04/01/20
C
65.00
Wayne
Bruce
04/01/20
B
100.00
However, I would also like to pull this by comparing it to a date that the user inputs, so that they could run this report at any point in time and get the correct "max" dates. So, for example, if there were also 7/1/20 dates in here, but the user input date was 6/30/20, I would NOT want to pull the 7/1/20 data. In other words, I would like to pull the rows with the maximum date by name w/o going over the user's input date.
Any idea on the best way to accomplish this?
Thanks in advance for any advice you can provide.
In SQL, ROW_NUMBER can be used to order records in groups by a particular field.
SELECT * FROM (
SELECT *, ROW_NUMBER()OVER(PARTITION BY Last_Name, First_Name ORDER BY DATE DESC) as ROW_NUM
FROM TABLE
) AS T
WHERE ROW_NUM = 1
Then you filter for ROW_NUM = 1.
However, I noticed that there are a couple with the same date and you want both. In this caseyou'd want to use RANK - which allows for ties so there may be multiple records with the same date that you want to capture.
SELECT * FROM (
SELECT *, RANK()OVER(PARTITION BY Last_Name, First_Name ORDER BY DATE DESC) as ROW_NUM
FROM TABLE
) AS T
WHERE ROW_NUM = 1

Build time window counters from raw data - Big Query

Consider raw events data regarding purchases in 2020, as per the following table:
BUYER DATE ITEM
Joe '2020-01-15' Dr. Pepper
Joe '2020-02-15' Dr. Pepper
Joe '2020-03-15' Dr. Pepper
Joe '2020-05-15' Dr. Pepper
Joe '2020-10-15' Dr. Pepper
Joe '2020-12-15' Dr. Pepper
I would like to aggregate the data to see what Joe did in a monthly moving sum, i.e., obtaining as an outcome
BUYER Date Num_Purchases_last_3months
Joe '2020-01-31' 1
Joe '2020-02-31' 2
Joe '2020-03-31' 3
Joe '2020-04-31' 2
.
.
.
Joe '2020-11-31' 1
Joe '2020-12-31' 2
How could I obtain the desired result in an efficient query?
You can use window functions, in this case, count(*) with a range window frame specification:
select t.*,
count(*) over (partition by buyer
order by extract(year from date) * 12 + extract(month from date)
range between 2 preceding and current row
) as Num_Purchases_last_3months
from t;

How to have the rolling distinct count of each day for past three days in Oracle SQL?

I searched for this a lot, but I couldn't find the solution yet. let me explain my question by sample data and my desired output.
sample data:
datetime customer
---------- --------
2018-10-21 09:00 Ryan
2018-10-21 10:00 Sarah
2018-10-21 20:00 Sarah
2018-10-22 09:00 Peter
2018-10-22 10:00 Andy
2018-10-23 09:00 Sarah
2018-10-23 10:00 Peter
2018-10-24 10:00 Andy
2018-10-24 20:00 Andy
my desired output is to have the distinctive number of customers for past three days relative to each day:
trunc(datetime) progressive count distinct customer
--------------- -----------------------------------
2018-10-21 2
2018-10-22 4
2018-10-23 4
2018-10-24 3
explanation: for 21th, because we have only Ryan and Sarah the count is 2 (also because we have no other records before 21th); for 22th Andy and Peter are added to the distinct list, so it's 4. for 23th, no new customer is added so it would be 4. for 24th, however, as we only should consider past 3 days (as per business logic), we should only take 24th,23th and 22th; so the distinct customers would be Sarah, Andy and Peter. so the count is 3.
I believe it is called the progressive count, or moving count or rolling up count. but I couldn't implement it in Oracle 11g SQL. Obviously it's easy by using PL-SQL programming (Stored-Procedure/Function). but, preferably I wonder if we can have it by a single SQL query.
What you seem to want is:
select date,
count(distinct customer) over (order by date rows between 2 preceding and current row)
from (select distinct trunc(datetime) as date, customer
from t
) t
group by date;
However, Oracle does not support window frames with count(distinct).
One rather brute force approach is a correlated subquery:
select date,
(select count(distinct t2.customer)
from t t2
where t2.datetime >= t.date - 2
) as running_3
from (select distinct trunc(datetime) as date
from t
) t;
This should have reasonable performance for a small number of dates. As the number of dates increases, the performance will degrade linearly.

SQL Retrieve distinct data by latest date

I have a table as follows:
Name Customer Date Amount
Joe Aaron 2012-01-03 14:12:00.0 150
Joe Aaron 2012-02-03 14:12:00.0 150
Joe Danny 2012-03-03 14:12:00.0 150
Joe Karen 2012-07-03 14:12:00.0 150
Ronald Blake 2012-05-03 14:12:00.0 1501
I would like to query to retrieve data by specifying the Name and if there are duplicates for Customer column, the records for the latest Date is
For example, if I want to query Joe, I will get the following result:
Name Customer Date Amount
Joe Aaron 2012-02-03 14:12:00.0 150
Joe Danny 2012-03-03 14:12:00.0 150
Joe Karen 2012-07-03 14:12:00.0 150
How should I do this? Tried distinct but it doesnt work that way.
EDIT
I'm using SQL Server. Sorry I re-edit my question and this should be the correct question that I am asking.
Did you Try this?
SELECT Name, Customer, MAX(Date) as CurrentDate, Amount
FROM data
Group By Name, Customer, Amount
HAVING Name = 'Joe'
Another option is using the WITH TIES clause in concert with Row_Number()
Example
Select top 1 with ties *
From YourTable
Where Name='Joe'
Order By Row_Number() over (Partition By Customer Order By Date Desc)
Returns
Name Customer Date Amount
Joe Aaron 2012-02-03 14:12:00.000 150
Joe Danny 2012-03-03 14:12:00.000 150
Joe Karen 2012-07-03 14:12:00.000 150
One option:
SELECT * FROM Table WHERE (NAME, CUSTOMER, DATE) IN (SELECT NAME, CUSTOMER, MAX(DATE) WHERE NAME = 'Joe' GROUP BY NAME, CUSTOMER)

SQL Server 2014 - Return Team name based on most recent date (somewhat dynamically)

My title is misleading because I don't know how to sum it up better than that :)
I have a table that keeps a history of changes made to users and what teams they belong to. It starts with their initial team and date, then adds an entry via a trigger when we change their teams in the UserList table.
Our business, like many, loves month to month data. I don't want to have entries for every single month if they don't change teams. Ill get to why that's a problem.
Here is an example of the data in the TeamHistory Table
UserID|CurrentTeam|ChangeDate
User1-|Team1------|01-01-2016
User1-|Team2------|03-01-2016
When I run a view or query that rolls the data up by person and media type (I can have 4 entries for a single person in a single month - voice, fax, email and voicemail) I then need to add the team that they were working on for that month.
Using that above example, if I ran the data for all of last year, I would expect Jan-May to display Team1. Then from June to Dec, Team 2. The problem is if I join the date field in my view/query with this table and use an = sign, then I only get data for 1-1 and 6-1, clearly because I only have those values in the table to match against. If I tell it to do < or <=, I start encountering duplicates as its just not specific enough.
If we need an example query, I can try to work something up that's not one of these massive views.
So lets assume this is my data:
Userid| Month |Media|Calls
User1-|-01/01/2016|Voice|200
User1-|-01/01/2016|Email|100
User1-|-02/01/2016|Voice|250
User1-|-02/01/2016|Email|120
User1-|-03/01/2016|Voice|250
User1-|-03/01/2016|Email|120
And the TeamHistory table has 2 entries, the team they started on for 1/1/2016 and then they switched for 3/1/2016. How do I join the two data sets, using the date and userid as my variables, to pull in the corresponding Team? Especially when I wont have an actual entry for 2/1/2016?
Id want my final dataset to look like this:
Userid|Team | Month |Media|Calls
User1-|Team1|-01/01/2016|Voice|200
User1-|Team1|-01/01/2016|Email|100
User1-|Team1|-02/01/2016|Voice|250
User1-|Team1|-02/01/2016|Email|120
User1-|Team2|-03/01/2016|Voice|250
User1-|Team2|-03/01/2016|Email|120
Since you're using SQL Server (2012 and newer) you can use the LEAD() function to identify an end date for a given range:
;with cte aS (SELECT 'User1' as UserID, 'Team1' AS CurrentTeam, CAST('2016-01-01' AS DATE) as ChangeDate
UNION SELECT 'User1' as UserID, 'Team2' AS CurrentTeam, CAST('2016-06-01' AS DATE) as ChangeDate
UNION SELECT 'User1' as UserID, 'Team1' AS CurrentTeam, CAST('2016-08-15' AS DATE) as ChangeDate
UNION SELECT 'User2' as UserID, 'Team1' AS CurrentTeam, CAST('2016-02-01' AS DATE) as ChangeDate
UNION SELECT 'User2' as UserID, 'Team2' AS CurrentTeam, CAST('2016-07-01' AS DATE) as ChangeDate
)
SELECT *,COALESCE(LEAD(ChangeDate,1) OVER(PARTITION BY UserID ORDER BY ChangeDate),CAST(GETDATE() AS DATE)) as End_Dt
FROM cte
Returns:
UserID CurrentTeam ChangeDate End_Dt
User1 Team1 2016-01-01 2016-06-01
User1 Team2 2016-06-01 2016-08-15
User1 Team1 2016-08-15 2017-01-05
User2 Team1 2016-02-01 2016-07-01
User2 Team2 2016-07-01 2017-01-05
You could then join those ranges to a calendar table to get the individual months as well as calculate which team they spent more days in for a given month.
The LEAD() function returns the next row's value for a given field, PARTITION BY is used to reset the next row based on some grouping, in this case you want the value per UserID, and ORDER BY is used to specify what the next row should be, in this case from one ChangeDate to the next.
You might try this:
--A simple person table
DECLARE #pers TABLE(Person VARCHAR(100));
INSERT INTO #pers VALUES('Bob'),('Tim');
--a table reflecting your work-data
--attention Tim is changing in July to Team Read and still in July back to Blue
DECLARE #Team TABLE(Person VARCHAR(100),Team VARCHAR(100),ChangeDate DATE);
INSERT INTO #Team VALUES
('Bob','Red' ,{d'2016-04-01'})
,('Tim','Blue',{d'2016-04-13'})
,('Tim','Red' ,{d'2016-07-22'})
,('Bob','Blue',{d'2016-06-15'})
,('Tim','Blue',{d'2016-07-28'})
,('Bob','Red' ,{d'2016-10-15'})
,('Tim','Red' ,{d'2016-12-28'})
;
--A CTE to mock-up a numbers/tally/date-table
WITH FirstOfMonthDays(d) AS
(
SELECT {d'2016-01-01'}
UNION ALL SELECT {d'2016-02-01'}
UNION ALL SELECT {d'2016-03-01'}
UNION ALL SELECT {d'2016-04-01'}
UNION ALL SELECT {d'2016-05-01'}
UNION ALL SELECT {d'2016-06-01'}
UNION ALL SELECT {d'2016-07-01'}
UNION ALL SELECT {d'2016-08-01'}
UNION ALL SELECT {d'2016-09-01'}
UNION ALL SELECT {d'2016-10-01'}
UNION ALL SELECT {d'2016-11-01'}
UNION ALL SELECT {d'2016-12-01'}
)
--I use CONVERT(VARCHAR(6),ChangeDate,112) to get a string of YYYYMM
,Numbered AS
(
SELECT ROW_NUMBER() OVER(PARTITION BY Person, CONVERT(VARCHAR(6),ChangeDate,112) ORDER BY ChangeDate DESC) AS Nr
,t.*
FROM #Team AS t
)
--Pick out the one with Nr=1, these are the last changes per month
,LastChangeInMonth AS
(
SELECT *
FROM Numbered
WHERE Nr=1
)
--The actual query
SELECT fom.d
,p.Person
,(
SELECT TOP 1 t.Team
FROM LastChangeInMonth AS t
WHERE t.Person=p.Person
AND CONVERT(VARCHAR(6),t.ChangeDate,112)<=CONVERT(VARCHAR(6),fom.d,112)
ORDER BY t.ChangeDate DESC
) AS fittingTeam
FROM FirstOfMonthDays AS fom
CROSS JOIN #pers AS p
ORDER BY p.Person,fom.d
Since you are using SQL Server 2014 (please tag your questions correctly!) this would be a bit easier with LEAD()/LAG/(), but the idea was the same...
The result
2016-01-01 Bob NULL
2016-02-01 Bob NULL
2016-03-01 Bob NULL
2016-04-01 Bob Red
2016-05-01 Bob Red
2016-06-01 Bob Blue
2016-07-01 Bob Blue
2016-08-01 Bob Blue
2016-09-01 Bob Blue
2016-10-01 Bob Red
2016-11-01 Bob Red
2016-12-01 Bob Red
2016-01-01 Tim NULL
2016-02-01 Tim NULL
2016-03-01 Tim NULL
2016-04-01 Tim Blue
2016-05-01 Tim Blue
2016-06-01 Tim Blue
2016-07-01 Tim Blue
2016-08-01 Tim Blue
2016-09-01 Tim Blue
2016-10-01 Tim Blue
2016-11-01 Tim Blue
2016-12-01 Tim Red