Identify overlap percent of IDs between 2 dates in same table - sql

I have a table of names with two different dates. I want to know the count of names that are occurring between the two dates and the overlap percentage.
This is the output format that is desired. I am not looking for dates in between. I am looking for records that are in July 05 and also in August 10. Overlap percentage for each id would be - count of records in July 5 and also August 10/count of records on July 5.(Actual table has dates in date datatype).
Overlap % will always be less than or equal to 100 since count of records existing on July 5 as well as August 10 will always be <=count of records on July 5.
id
Count on July 05
Count of IDs from July 05 included in August 10
% overlap
ABC
BCD
CDE
DEF
EFG
Rough version of the input table
id
type
Group
date
ABC
Mobile
1
July 5
BCD
Mobile
1
July 5
ABC
Desktop
1
August 10
CDE
Mobile
2
July 5
BCD
Mobile
2
August 10

As I understood from your comments, the overlap will be the minimum count value of the two dates, i.e. for ABC if we have 6 in July and 2 in August the overlap will be 2, and if we have 3 in July and 5 in August the overlap will be 3.
If that is the case then you may use the following query tested on MS SQL Server 2019:
SELECT t.id, t.[Count on July 05],
CASE
WHEN t.[Count on July 05]<= t.[Count of August 10] THEN t.[Count on July 05]
WHEN t.[Count on July 05]> t.[Count of August 10] THEN t.[Count of August 10]
END AS [Count of IDs from July 05 included in August 10],
CASE
WHEN t.[Count on July 05]<= t.[Count of August 10] THEN CAST(t.[Count on July 05]*1.00/t.[Count on July 05] * 100 AS DECIMAL(18, 2))
WHEN t.[Count on July 05]> t.[Count of August 10] THEN CAST(t.[Count of August 10]*1.00/t.[Count on July 05] * 100 AS DECIMAL(18, 2))
END AS [% overlap]
FROM(
SELECT id,
COUNT(CASE WHEN [tdate] IN ('July 5') THEN 1 END) as [Count on July 05],
COUNT(CASE WHEN [tdate] IN ('August 10') THEN 1 END) as [Count of August 10]
FROM [Tbl]
GROUP BY id) t
I hope that is what you are looking for.

Related

From one record to more records that represent mm/yyyy

Let's say that we have this table
Employee
EmploymentStarted
EmploymentEnded
Sara
20210115
20210715
Lora
20210215
20210815
Viki
20210515
20210615
Now what I need is a table that we can see all the employees that we had each month. For example, Sara started on January 15th 2021 and she left the company on July 15th 2021. This means that she has been with us during January, February, March, April, May, June and July.
The result table should look like this:
Month
Year
Employee
January
2021
Sara
February
2021
Sara
February
2021
Lora
March
2021
Sara
March
2021
Lora
April
2021
Sara
April
2021
Lora
May
2021
Sara
May
2021
Lora
May
2021
Viki
June
2021
Sara
June
2021
Lora
June
2021
Viki
July
2021
Sara
July
2021
Lora
August
2021
Lora
How can I get a table like this in SQL?
I tried a group by, but it does not seem to be the right way to do it
It would be interesting to find out in practice how much performance decreases when using recursion. In this case calendarTable contain less about 12 records per year. Most part of query cost is JOIN to Employee (staff) table.
with FromTo as (
select min(EmploymentStarted) fromDt, eomonth(max(EmploymentEnded)) toDt
from staff
)
--,FromTo as(select fromDt=#fromDt,toDt=#toDt)
,rdT as (
select 1 n,fromDt bM,eomonth(fromDt) eM
,fromDt, toDt
from FromTo
union all
select n+1 n,dateadd(day,1,eM) bM,eomonth(dateadd(month,1,bM)) eM
,fromDt,toDt
from rdT where dateadd(month,1,bM)<toDt
)
select month(bM) as 'Month',year(bM) as 'Year',Employee --*
from rdT left join staff s on s.EmploymentEnded>=bM and s.EmploymentStarted<=eM
Fiddle

How to use SAS/SQL to create a table with certain conditions from a dataset

I have a dataset with ID and event_year (event meaning something happened that year, a person has more than one record in this table with more than one event year eg. ID 1 can have three entries with event_year 2017, 2018, 2019 ). Example dataset like:
ID event_year
1 2017
1 2018
1 2019
2 2018
2 2017
ID
event_year
1
2017
1
2018
1
2019
2
2018
2
2017
I need to get a table from this of all ID where the event_year is between 2017 and 2021 to make a frequency table counting people with event_year at set years 2017, 2018, 2019, 2020, 2021 (these are the columns refer to as study year x).
Year frequency
2017 2
2018 2
2019 1
2020 1
2021 0
Year
frequency
2017
2
2018
2
2019
1
2020
1
2021
0
Another condition is for the study year x if a person didnt have an event_year in x but had event_year x-1 they will be included in the frequency of year x, for example the ID 1 above should be included in frequency of once in each 2017, 2018, 2019 and 2020- because following the condition above for year 2020 they didnt have event_year in 2020 but did in 2019 so will be included in 2020. I apologise if this is confusing and would be happy to clarify
If I understood your question, this should work:
data have;
input ID event_year;
datalines;
1 2017
1 2018
1 2019
2 2018
2 2017
3 2017
3 2020
;
run;
For the next step (your additional requirement of being included a year after last event) we need data grouped by ID.
proc sort data=have;
by ID;
run;
We just add extra rows to a table, where a year is last year + 1.
data have;
set have;
output;
by ID;
if last.ID then do;
ID=ID;
event_year=event_year+1;
output;
end;
run;
Now we just check how many different IDs every year had. If you want to check only for certain years, just add a where clause (for example, where event_year in (2017, 2018, 2019, 2020, 2021) ).
proc sql;
create table want as
select distinct event_year, count(distinct ID) as frequency
from have
group by event_year
;
run;

Aggregate multiple invoice numbers and invoice amount rows into one row

I have the following:
budget_id
invoice_number
April
June
August
004
11
NULL
690
NULL
004
12
1820
NULL
NULL
004
13
NULL
NULL
890
What I want to do is do the following:
budget_id
invoice_number
April
June
August
004
11, 12, 13
1820
690
890
However, when I try to do the following:
SELECT budget_id,
STRING_AGG(invoice_number, ',') AS invoice number,
April,
June,
August
FROM invoice_table
GROUP BY budget_id,
April,
June,
August
Nothing happens. The table stays exactly the same. The code above works if I'm able to comment out the months as it aggregates the invoices numbers without the months. But once I include the months, I still get 3 separate rows. I need the invoice amounts to be included with the months. Is it possible to get the invoice numbers aggregated as well as the invoice amounts in one row? I'm using Big Query if that helps.
Use below query,
SELECT budget_id,
STRING_AGG(invoice_number, ',') invoice_number,
SUM(April) April,
SUM(June) June,
SUM(August) August
FROM invoice_table
GROUP BY 1;

Count the number of records for each 1st of the month in SQL

I have a dataset where I would like to query and obtain output of a count of records for the first of every month.
Data
name date1
hello july 1 2018
hello july 1 2018
hello july 10 2018
sure august 1 2019
sure august 1 2019
why august 20 2019
ok september 1 2019
ok september 1 2019
ok september 1 2019
sure september 5 2019
Desired
ID MONTH Day YEAR
2 July 1 2018
2 August 1 2019
3 September 1 2019
We are only counting the records from the 1st of each month
Doing
USE [Data]
SELECT COUNT(*) AS ID , MONTH(date1) AS MONTH, YEAR(date1) AS YEAR
FROM dbo.data1
GROUP BY MONTH(date1), YEAR(date1)
ORDER BY YEAR ASC
This only outputs the year and month
Any suggestion is appreciated
Assuming you are using the implicit conversion for date
Example
SELECT COUNT(*) AS ID,
DATENAME(MONTH,date1) AS MONTH,
DATEPART(DAY,date1) as DAY,
YEAR(date1) AS YEAR
FROM dbo.data1
WHERE DAY(date1)=1
GROUP BY YEAR(date1),DATENAME(MONTH,date1),DATEPART(DAY,date1)
ORDER BY YEAR ASC
Results
ID MONTH DAY YEAR
2 July 1 2018
2 August 1 2019
3 September 1 2019

How to get column value comparison in sql?

I have a table as below. The table holds the price of a product for each day in a year. I would like to get price change for each day by year.
Product Year 1Jan 2Jan .................... 31Dec
A 2018 10 20 .................... 120
A 2019 130 150 .................... 200
B 2018 15 23 .................... 90
B 2019 113 130 .................... 220
I would like to compare columns sequentially with year overlaps and get output as below.
• For the year 2018, by negating the value 2 Jan from 1 Jan (2 Jan-1 Jan), we get the new value of 2 Jan.
• For the year 2018, by negating the value 3Jan from 2 Jan (3 Jan-2 Jan), we get the new value of 3 Jan.
• For the year 2018, by negating the value 31Dec from 30 Dec (31 Dec-30 Dec), we get the new value of 31 Dec
• Now, For the year 2019, by negating the value 31 Dec(2018 year) from 1 Jan (2019 year), we get the new value of 1 Jan, 2019
So, in a nutshell, the value of a column is the difference of its value with previous day value.
Product Year 1Jan 2Jan .................... 31Dec
A 2018 10 10 .................... 15 (just assume value of 30Dec column is 105)
A 2019 10 20 .................... 10 (just assume value of 30Dec column is 190)
B 2018 15 8 .................... 8 (just assume value of 30Dec column is 82)
B 2019 23 17 .................... 10 (just assume value of 30Dec column is 210)
Let me know, if things are not clear.
Though logically there is nothing in this query, but still you have to work hard to write it -
SELECT Product
,Year
,1Jan
,2Jan - 1Jan 2Jan
,3Jan - 2Jan 3Jan
.
.
.
,31Dec - 30Dec 31Dec
FROM YOUR_TAB
ORDER BY Product
,Year;
first of all I think the design of the table could be better but thats a topic for some other time. Right now below code should work -
SELECT Product, Year,
1Jan AS '1st Jan',
2Jan-1Jan AS '2nd Jan',
3Jan-2Jan AS '3rd Jan',
4Jan-3Jan AS '4th Jan',
.
.
.
.
.
31Dec-30Dec AS '31st Dec',
FROM [table name];