How to use SAS/SQL to create a table with certain conditions from a dataset - sql

I have a dataset with ID and event_year (event meaning something happened that year, a person has more than one record in this table with more than one event year eg. ID 1 can have three entries with event_year 2017, 2018, 2019 ). Example dataset like:
ID event_year
1 2017
1 2018
1 2019
2 2018
2 2017
ID
event_year
1
2017
1
2018
1
2019
2
2018
2
2017
I need to get a table from this of all ID where the event_year is between 2017 and 2021 to make a frequency table counting people with event_year at set years 2017, 2018, 2019, 2020, 2021 (these are the columns refer to as study year x).
Year frequency
2017 2
2018 2
2019 1
2020 1
2021 0
Year
frequency
2017
2
2018
2
2019
1
2020
1
2021
0
Another condition is for the study year x if a person didnt have an event_year in x but had event_year x-1 they will be included in the frequency of year x, for example the ID 1 above should be included in frequency of once in each 2017, 2018, 2019 and 2020- because following the condition above for year 2020 they didnt have event_year in 2020 but did in 2019 so will be included in 2020. I apologise if this is confusing and would be happy to clarify

If I understood your question, this should work:
data have;
input ID event_year;
datalines;
1 2017
1 2018
1 2019
2 2018
2 2017
3 2017
3 2020
;
run;
For the next step (your additional requirement of being included a year after last event) we need data grouped by ID.
proc sort data=have;
by ID;
run;
We just add extra rows to a table, where a year is last year + 1.
data have;
set have;
output;
by ID;
if last.ID then do;
ID=ID;
event_year=event_year+1;
output;
end;
run;
Now we just check how many different IDs every year had. If you want to check only for certain years, just add a where clause (for example, where event_year in (2017, 2018, 2019, 2020, 2021) ).
proc sql;
create table want as
select distinct event_year, count(distinct ID) as frequency
from have
group by event_year
;
run;

Related

Return the first and last value from one column when value from another column changes

I am trying to write a PostgreSQL query to return the first and last dates corresponding to indices. I have a table:
Datetime
Index
March 1 2021
0
March 2 2021
0
March 3 2021
0
March 4 2021
1
March 5 2021
1
March 6 2021
2
In this case, I would want to return:
I am wondering how I would write the PostgreSQL query for this.
I think this can be done with the following:
SELECT MIN("Datetime") AS Start
, MAX("Datetime") AS End
, "Index"
FROM <your_table>
GROUP BY "Index"
ORDER BY "Index"
;

Count the number of records for each 1st of the month in SQL

I have a dataset where I would like to query and obtain output of a count of records for the first of every month.
Data
name date1
hello july 1 2018
hello july 1 2018
hello july 10 2018
sure august 1 2019
sure august 1 2019
why august 20 2019
ok september 1 2019
ok september 1 2019
ok september 1 2019
sure september 5 2019
Desired
ID MONTH Day YEAR
2 July 1 2018
2 August 1 2019
3 September 1 2019
We are only counting the records from the 1st of each month
Doing
USE [Data]
SELECT COUNT(*) AS ID , MONTH(date1) AS MONTH, YEAR(date1) AS YEAR
FROM dbo.data1
GROUP BY MONTH(date1), YEAR(date1)
ORDER BY YEAR ASC
This only outputs the year and month
Any suggestion is appreciated
Assuming you are using the implicit conversion for date
Example
SELECT COUNT(*) AS ID,
DATENAME(MONTH,date1) AS MONTH,
DATEPART(DAY,date1) as DAY,
YEAR(date1) AS YEAR
FROM dbo.data1
WHERE DAY(date1)=1
GROUP BY YEAR(date1),DATENAME(MONTH,date1),DATEPART(DAY,date1)
ORDER BY YEAR ASC
Results
ID MONTH DAY YEAR
2 July 1 2018
2 August 1 2019
3 September 1 2019

Determine the first occurrence of a particular customer visiting the store in a particular month

I need to determine the counts breakdown to per month (and year) of customers [alias'ed as Patient_ID] which made their first visit to a store. The date times of store visits are stored in the [MDT Review Date] column of the table.
Customers can come to the store multiple times throughout the year and increase the total count-> but what I require is ONLY the first time a customer visited.
E.g. Tom Bombadil visited the store once in January 2019, so count increased to 1, then again 4 times in March, so count should be 1 for the month of March and 0 for febraury and 1 for January, then again 4 times in October, then again 2 times in December.
I require that Tom Bombadil should be counted one and only once for a particular month, his first occurrence which was per month
The output should be like :
rn1 YEAR Month_Number Month Total_Count
1 2010 6 June 2
1 2010 7 July 1
1 2010 8 August 5
1 2010 10 October 5
1 2010 11 November 3
1 2011 1 January 4
1 2011 2 February 6
1 2011 4 April 7
1 2011 5 May 4
1 2011 6 June 10
1 2011 7 July 10
1 2011 8 August 14
1 2011 9 September 4
1 2011 10 October 8
1 2011 11 November 11
1 2011 12 December 11
1 2012 1 January 8
1 2012 2 February 21​
Please refer to my query. What I have attempts to use the windowing function COUNT to count the store visits per month. Then the ROW_NUMBER function attempts to assign a unique number to each visit. What am I doing wrong?
select
*
from
(select distinct
row_number() over (partition by p.Patient_ID, p.PAT_Forename1, p.PAT_Surname
order by PAT_Forename1, p.Patient_ID, PAT_Surname) AS rn1,
datepart(year, [DATE_COLUMN]) as YEAR,
datepart(month, [DATE_COLUMN]) as Month_Number,
datename(month,[DATE_COLUMN]) as Month,
count(p.Patient_ID) over (partition by datepart(year,[DATE_COLUMN]),
datename(month, [DATE_COLUMN])) as Total_Count
from
Tablename m
inner join
TableName p on m.PK_ID = p.PK_ID
) as temp
where
rn1 = 1​

Predict future Data trends with current Data using sql script and them plot them over SSRS

I have been asked to create a trendline in SSRS, this trendline will the predicted future value based on current year data.
Here I have data of year 2018 and I needed to predict the trends of ClaimVolume for year 2019.
Please find the data
Month Month Name Year ClaimVolume
1 January 2018 13746
2 February 2018 13412
3 March 2018 15143
4 April 2018 15655
5 May 2018 15190
6 June 2018 15365
7 July 2018 18943
8 August 2018 24305
9 September 2018 18893
10 October 2018 26659
11 November 2018 18696
12 December 2018 22367
Please help me in providing SQL query for the above task.

distribute a value starting from the first months

let be a query such as the following.
Select MONTH, sum(RECEIVABLES), sum(COLLECTED) from TABLE1 group by MONTH
result
MONTH RECEIVABLES COLLECTED
JANUARY 2 0
FEBRUARY 1 0
MARCH 3 0
Now, APRIL 4 get made COLLECTED ...
Question: APRIL 4 in value, starting from the first month , we distribute according to how COLLECTED column.
as follows
MONTH RECEIVABLES COLLECTED
JANUARY 2 2
FEBRUARY 1 1
MARCH 3 1
APRIL 0 0
With SQL or stored procedures...
thanks...