Filling up values within a group - sql

I have a SAS table like below
ID
Grp
Month
A
202201
A
202203
1234
A
202204
B
202201
B
202203
AB1234
B
202204
C
202201
C
202203
3333
C
202204
3333
C
202205
4444
C
202206
T
202204
T
202205
T
202206
D
202201
D
202203
A555
D
202204
D
202205
6666
D
202206
I required the output SAS dataset as :
ID
Grp
Month
1234
A
202201
1234
A
202203
1234
A
202204
AB1234
B
202201
AB1234
B
202203
AB1234
B
202204
3333
C
202201
3333
C
202203
3333
C
202204
3333
C
202205
4444
C
202206
T
202204
T
202205
T
202206
A555
D
202201
A555
D
202203
A555
D
202204
6666
D
202205
6666
D
202206
Can someone please help??
Thanks in advance

If the data was sorted by descending MONTH values it would be lot easier. It is much easier to remember a value you have already seen than to predict what value you might see in the future.
First let's convert your listing into an actual dataset we can use to work with.
data have ;
input ID $ Grp $ Month ;
cards;
. A 202201
. A 202203
1234 A 202204
. B 202201
. B 202203
AB1234 B 202204
. C 202201
. C 202203
3333 C 202204
3333 C 202205
4444 C 202206
. T 202204
. T 202205
. T 202206
. D 202201
. D 202203
A555 D 202204
. D 202205
6666 D 202206
;
Now sort it by GRP and descending MONTH and you can use the UPDATE statement to do a last observation carried forward.
proc sort data=have;
by grp descending month;
run;
data want;
update have(obs=0) have;
by grp;
output;
run;
If you want you can resort to have ascending month values.
proc sort data=want;
by grp month;
run;
Results:
Obs ID Grp Month
1 1234 A 202201
2 1234 A 202203
3 1234 A 202204
4 AB1234 B 202201
5 AB1234 B 202203
6 AB1234 B 202204
7 3333 C 202201
8 3333 C 202203
9 3333 C 202204
10 3333 C 202205
11 4444 C 202206
12 A555 D 202201
13 A555 D 202203
14 A555 D 202204
15 6666 D 202205
16 6666 D 202206
17 T 202204
18 T 202205
19 T 202206
If you really have to deal with the data in the order shown then you could use a double DOW loop. The first loop to find the next non missing ID value. And the second to re-read the observations and update the ID value and write them out.
data want;
if 0 then set have;
do _n_=1 by 1 until(last.grp or not missing(id));
set have ;
by grp notsorted;
end;
_id = id;
do _n_=1 to _n_;
set have;
id = _id;
output;
end;
drop _id;
run;

Related

sql compute time differ duration and leave result as float

I have a table my_table:
case_id first_created last_paid submitted_time
3456 2021-01-27 2021-01-29 2021-01-26 21:34:36.566023+00:00
7891 2021-08-02 2021-09-16 2022-10-26 19:49:14.135585+00:00
1245 2021-09-13 None 2022-10-31 02:03:59.620348+00:00
9073 None None 2021-09-12 10:25:30.845687+00:00
6891 2021-08-03 2021-09-17 None
First I need create 2 variables:
create_duration = first_created-submitted_time
paid_duration= last_paid-submitted_time
And if submitted_time is none just ignore that row.
my code:
select * from my_table
first_po_created-coalesce(submitted_timestamp::date) as create_duration,
last_po_paid-coalesce(submitted_timestamp::date) as paid_duration
The output:
case_id first_created last_paid submitted_time create_duration paid_duration
3456 2021-01-27 2021-01-29 2021-01-26 21:34:36.566023+00:00 1 3
7891 2021-08-02 2021-09-16 2022-10-26 19:49:14.135585+00:00 0 0
1245 2021-09-13 None 2022-10-31 02:03:59.620348+00:00 0 null
9073 None None 2021-09-12 10:25:30.845687+00:00 null null
6891 2021-08-03 2021-09-17 null null null
This is fine,but the result are integers ,how can I receive the result as a float and have 1 digit number remain, something like 1.3 days ,0.5 days ?

Identify Rank of date ranges from datetime column

I have members, the group in which they belong and datetimes in which they were active. I want to find out which of the members had gap of more than 3 months between dates and I need to rank them.
header 1
header 2
Create Date
Rank
11111
EAM
2022-01-27 12:23:28.474000000
1
11111
EAM
2022-08-25 10:41:15.500000000
2
11111
EAM
2022-09-01 18:15:07.362000000
2
11111
EAM
2022-09-08 13:03:38.859000000
2
11111
EAM
2022-10-06 18:15:07.245000000
2
11111
PEM
2022-07-25 10:41:15.500000000
1
11111
PEM
2022-08-25 10:41:15.500000000
1
11111
PEM
2022-09-26 13:03:38.859000000
1
The desired result is above with the rank; the table contains the data without the Rank column.
One method is to use LAG to get the prior date, compared the 2 dates return 1 if it's more than 3 months, and then SUM those values in a windowed aggregate:
WITH CTE AS(
SELECT header1,
header2,
CreateDate,
CASE WHEN DATEDIFF(MONTH,LAG(CreateDate) OVER (PARTITION BY header2 ORDER BY CreateDate),CreateDate) > 3 THEN 1 ELSE 0 END AS Counter
FROM (VALUES(11111,'EAM',CONVERT(datetime2(7),'2022-01-27 12:23:28.474000000')),
(11111,'EAM',CONVERT(datetime2(7),'2022-08-25 10:41:15.500000000')),
(11111,'EAM',CONVERT(datetime2(7),'2022-09-01 18:15:07.362000000')),
(11111,'EAM',CONVERT(datetime2(7),'2022-09-08 13:03:38.859000000')),
(11111,'EAM',CONVERT(datetime2(7),'2022-10-06 18:15:07.245000000')),
(11111,'PEM',CONVERT(datetime2(7),'2022-07-25 10:41:15.500000000')),
(11111,'PEM',CONVERT(datetime2(7),'2022-08-25 10:41:15.500000000')),
(11111,'PEM',CONVERT(datetime2(7),'2022-09-26 13:03:38.859000000')))V(header1,header2,CreateDate))
SELECT header1,
header2,
CreateDate,
SUM(Counter) OVER (PARTITION BY header2 ORDER BY CreateDate) + 1 AS Rank
FROM CTE;
select header1
,header2
,Create_Date
,dense_rank() over(partition by header1, header2 order by flg) as Rank
from
(
select *
,case when datediff(month, Create_Date, lead(Create_Date) over(partition by header1, header2 order by Create_Date)) >= 3 then 0 else 1 end as flg
from t
) t
header1
header2
Create_Date
Rank
11111
EAM
2022-01-27 12:23:28.000
1
11111
EAM
2022-08-25 10:41:15.000
2
11111
EAM
2022-09-01 18:15:07.000
2
11111
EAM
2022-09-08 13:03:38.000
2
11111
EAM
2022-10-06 18:15:07.000
2
11111
PEM
2022-07-25 10:41:15.000
1
11111
PEM
2022-08-25 10:41:15.000
1
11111
PEM
2022-09-26 13:03:38.000
1
Fiddle

count of rows based on multiple conditions/partitions on the same table

DBFiddle Link: https://dbfiddle.uk/?rdbms=postgres_14&fiddle=2d7e9a4ddfdc8fb619a8dfc76d767950
Hi. I have one table called 'Model Versions' which has the fields and records as below.
intent_id
intent_name
version
version_created_at
client
sentence
1
a_intent
1
2021-01-01
es_client1
sentence_1
1
a_intent
1
2021-01-01
es_client1
sentence_2
1
a_intent
1
2021-01-01
es_client1
sentence_3
2
b_intent
2
2021-02-01
es_client1
sentence_1
2
b_intent
2
2021-02-01
es_client1
sentence_2
2
b_intent
2
2021-02-01
es_client1
sentence_3
3
c_intent
3
2021-03-01
es_client1
sentence_1
3
c_intent
3
2021-03-01
es_client1
sentence_2
4
d_intent
4
2021-04-01
es_client1
sentence_1
4
d_intent
4
2021-04-01
es_client1
sentence_2
5
e_intent
5
2021-05-01
es_client1
sentence_1
6
g_intent
1
2021-01-01
es_client2
sentence_1
6
g_intent
1
2021-01-01
es_client2
sentence_2
7
h_intent
2
2021-03-01
es_client2
sentence_1
7
h_intent
2
2021-03-01
es_client2
sentence_2
7
h_intent
2
2021-03-01
es_client2
sentence_3
8
i_intent
3
2021-04-01
es_client2
sentence_1
8
i_intent
3
2021-04-01
es_client2
sentence_2
9
j_intent
4
2021-05-01
es_client2
sentence_1
9
j_intent
4
2021-05-01
es_client2
sentence_2
10
k_intent
1
2021-01-01
es_client3
sentence_1
10
k_intent
1
2021-01-01
es_client3
sentence_2
11
k_intent
2
2021-06-01
es_client3
sentence_1
11
k_intent
2
2021-06-01
es_client3
sentence_2
12
k_intent
3
2021-07-01
es_client3
sentence_1
12
k_intent
3
2021-07-01
es_client3
sentence_2
13
k_intent
4
2021-08-01
es_client3
sentence_1
13
k_intent
4
2021-08-01
es_client3
sentence_2
14
k_intent
5
2021-10-01
es_client3
sentence_1
14
k_intent
5
2021-10-01
es_client3
sentence_2
Expected Output:
I wanted to get the top 3 versions of each client along with their respective sentence count. My expected output looks like below:
client
version
total_count_of_sentences_per_version
version_created_at
es_client1
5
1
2021-05-01
es_client1
4
2
2021-04-01
es_client1
3
2
2021-03-01
es_client2
4
2
2021-05-01
es_client2
3
2
2021-04-01
es_client2
2
3
2021-03-01
es_client3
5
2
2021-10-01
es_client3
4
2
2021-08-01
es_client3
3
2
2021-06-01
I tried writing a query with multiple CTEs and Partition By's. But none worked out. Seeking your help to achieve this.
DBFiddle Link: https://dbfiddle.uk/?rdbms=postgres_14&fiddle=2d7e9a4ddfdc8fb619a8dfc76d767950
You did not specify which of the top 3 version you wish to fetch. I'll assume you want to retrieve the 3 latest versions, based on creation date.
My suggestion is to use a ROW_NUMBER() for each client in a windowed function, and to filter the top 3 rows.
For instance :
with cte as(
select
client,
version,
version_created_at,
count(Sentence) total_count_of_sentences_per_version,
row_number() over(partition by client order by version_created_at desc) version_row_number
from model_versions
group by
client,version,
version_created_at
)
select
client,
version,
total_count_of_sentences_per_version,
version_created_at
from cte
where version_row_number <=3
Try it online
You can try this:
WITH main_tab
AS (SELECT client,
version,
Count(*)
OVER (
partition BY client, version),
Min(version_created_at)
OVER (
partition BY client, version),
Dense_rank()
OVER (
partition BY client
ORDER BY version DESC) rn
FROM model_versions)
SELECT DISTINCT m.*
FROM main_tab m;

How can I join two tables on an ID and a DATE RANGE in SQL

I have 2 query result tables containing records for different assessments. There are RAssessments and NAssessments which make up a complete review.
The aim is to eventually determine which reviews were completed. I would like to join the two tables on the ID, and on the date, HOWEVER the date each assessment is completed on may not be identical and may be several days apart, and some ID's may have more of an RAssessment than an NAssessment.
Therefore, I would like to join T1 on to T2 on ID & on T1Date(+ or - 7 days). There is no other way to match the two tables and to align the records other than using the date range, as this is a poorly designed database. I hope for some help with this as I am stumped.
Here is some sample data:
Table #1:
ID
RAssessmentDate
1
2020-01-03
1
2020-03-03
1
2020-05-03
2
2020-01-09
2
2020-04-09
3
2022-07-21
4
2020-06-30
4
2020-12-30
4
2021-06-30
4
2021-12-30
Table #2:
ID
NAssessmentDate
1
2020-01-07
1
2020-03-02
1
2020-05-03
2
2020-01-09
2
2020-07-06
2
2020-04-10
3
2022-07-21
4
2021-01-03
4
2021-06-28
4
2022-01-02
4
2022-06-26
I would like my end result table to look like this:
ID
RAssessmentDate
NAssessmentDate
1
2020-01-03
2020-01-07
1
2020-03-03
2020-03-02
1
2020-05-03
2020-05-03
2
2020-01-09
2020-01-09
2
2020-04-09
2020-04-10
2
NULL
2020-07-06
3
2022-07-21
2022-07-21
4
2020-06-30
NULL
4
2020-12-30
2021-01-03
4
2021-06-30
2021-06-28
4
2021-12-30
2022-01-02
4
NULL
2022-01-02
Try this:
SELECT
COALESCE(a.ID, b.ID) ID,
a.RAssessmentDate,
b.NAssessmentDate
FROM (
SELECT
ROW_NUMBER() OVER (PARTITION BY ID ORDER BY ID) RowId, *
FROM table1
) a
FULL OUTER JOIN (
SELECT
ROW_NUMBER() OVER (PARTITION BY ID ORDER BY ID) RowId, *
FROM table2
) b ON a.ID = b.ID AND a.RowId = b.RowId
WHERE (a.RAssessmentDate BETWEEN '2020-01-01' AND '2022-01-02')
OR (b.NAssessmentDate BETWEEN '2020-01-01' AND '2022-01-02')

check if the person have pervious record with certain date :db2 sql

i have the following table :
id name start end
1 Asla 2021-01-01 2021-12-31
1 Asla 2022-01-01 2022-04-15
2 Tina 2021-05-16 2021-09-23
3 Layla 2021-01-01 2021-09-27
3 Layla 2022-01-01 2022-07-18
2 Sim 2020-05-12 2020-08-13
3 Anderas 2021-07-01 2021-09-13
3 Anderas 2021-10-01 2021-11-18
3 Anderas 2022-01-01 2029-11-18
4 Klara 2022-01-01 null
what i want to do get persons that have work (date) under 2021 and create a new column that show status (if the person continue having work under 2022 -- ok else not ok and if the person is new like 'Klara' get new ) and show last record for every person . maybe too End = null ??????
i tried this .
select w.id ,w.name ,w.start ,w.end, max_date.end
from Work_date w
left join (select * from Work_date where start>='2022-01-01')max_date on max_date.id=id
where w.start>='2021-01-01'
``` but the problem i get the result as this
<pre>
id name start end
1 Asla 2021-01-01 null
1 Asla 2022-01-01 2022-04-15
2 Tina 2021-05-16 null
3 Layla 2021-01-01 null
3 Layla 2022-01-01 2022-07-18
3 Anderas 2021-07-01 null
3 Anderas 2021-10-01 2021-11-18
3 Anderas 2022-01-01 null
4 Klara 2022-01-01 null
</pre>
men i want to get result as <pre>
id name start end status
1 Asla 2022-01-01 2022-04-15 ok
2 Tina 2021-05-16 2021-09-23 not ok
3 Layla 2022-01-01 2022-07-18 ok
3 Anderas 2022-01-01 2029-11-18 ok
4 Klara 2022-01-01 null ok
Looks like you can simply aggregate.
Then use a CASE WHEN for the status.
select
w.id
, w.name
, max(w.start) as start
, max(w.end) as end
, case
when year(max(end)) < 2022 then 'not ok'
else 'ok'
end as status
from Work_date w
where w.start >= '2021-01-01'
group by w.id, w.name
order by w.id, max(w.start), max(w.end);
ID
NAME
START
END
STATUS
1
Asla
2022-01-01
2022-04-15
ok
2
Tina
2021-05-16
2021-09-23
not ok
3
Layla
2022-01-01
2022-07-18
ok
3
Anderas
2022-01-01
2029-11-18
ok
4
Klara
2022-01-01
null
ok
Demo on db<>fiddle here