Subsetting Data Using Proc SQL - sql

I am fairly new to SAS and am working on a sorting exercise to improve my SAS skills, but I seem to keep getting stuck since this dataset has observations with different date ranges.
I was given generated admission and discharge data for patients who visited two different hospitals. The data is sorted on admission date and is contained in one dataset. My goal is to create two datasets from this large dataset. The first dataset should contain the patient ID's for those patients who went to hospital A prior to visiting hospital B. The second data set should contain the patient ID's for those patients who went to hospital B prior to visiting hospital A. A sample of the main dataset looks like this:
ID Hospital Admission_Date Discharge_Date
1 A 21AUG2018 24AUG2018
1 A 02OCT2019 07OCT2019
1 B 07OCT2019 17OCT2019
2 B 01AUG2020 13AUG2020
2 A 28SEP2020 30SEP2020
3 B 17MAY2019 18MAY2019
3 A 18MAY2019 21MAY2019
3 B 21MAY2019 31MAY2019
The two resulting datasets should only include the patient ID's.
For instance, for the datasets where patients went from Hospital A to Hospital B we should have something like this:
ID
1
For the cases where patients went from Hospital B to Hospital A, we should have something like this:
ID
2
3
Any help on this would be greatly appreciated!

In SQL, it would look like this:
proc sql;
create table a_to_b as
select distinct a.id
from have as a
, have as b
where a.id = b.id
AND a.hospital = 'A'
AND b.hospital = 'B'
AND b.admission_date GE a.discharge_date
;
create table b_to_a as
select distinct a.id
from have as a
, have as b
where a.id = b.id
AND a.hospital = 'A'
AND b.hospital = 'B'
AND a.admission_date GE b.discharge_date
;
quit;
The data step version only requires one pass. It assumes that your data is already sorted in the correct order and compares the previous row to the current row. If there is any ID that goes from A to B or B to A, we set a flag for that ID to 1 and stop comparing any further. When we reach the last value of that ID, we output to the appropriate dataset.
data a_to_b
b_to_a
;
set have;
by id;
retain flag_a_to_b flag_b_to_a;
lag_hospital = lag(hospital);
lag_discharge_date = lag(discharge_date);
if(first.id) then call missing(of lag:, of flag:);
if(flag_a_to_b < 1) then flag_a_to_b = ( hospital = 'B'
AND lag_hospital = 'A'
AND admission_date GE lag_discharge_date
)
;
if(flag_b_to_a < 1) then flag_b_to_a = ( hospital = 'A'
AND lag_hospital = 'B'
AND admission_date GE lag_discharge_date
)
;
if(last.id AND flag_a_to_b) then output a_to_b;
if(last.id AND flag_b_to_a) then output b_to_a;
keep id;
run;
How we arrived at the SQL code
SQL in SAS cannot do lags, so instead we do an inner join on both IDs, but get all combinations of admissions and discharges between hospitals A and B. It looks like this:
From this table, we know that:
hospital_a must be 'A' and hospital_b must be 'B'
The admission date from one hospital must be >= the discharge date from the other hospital
Knowing that, we arrive at the following where clauses:
A to B:
a.id = b.id
AND a.hospital = 'A'
AND b.hospital = 'B'
AND b.admission_date GE a.discharge_date
B to A:
a.id = b.id
AND a.hospital = 'A'
AND b.hospital = 'B'
AND a.admission_date GE b.discharge_date

Related

Data Comparison between Two Tables

I have what should be simple (maybe) and I am just struggling with it.
Here is the scenario:
TABLE 1 contains all the data
TABLE 2 contains only a subset
I need a query that will look at table 1 and give a list of items that are not in table 2. Below is what I have but I know its not performing as such.
SELECT c.[DOC_ID], d.[DOCID]
FROM [dbo].[Custom_SUAM_Docuware] d
LEFT JOIN [dbo].[Custom_SUAM_Content] c ON (c.[DOC_ID] = d.[DOCID])
WHERE c.[DOC_ID] IS NULL
OR d.[DOCID] IS NULL
You are describing a not exists scenario.
You can't expect to return data from c since by definition what you want doesn't exist:
select d.DOCID
from dbo.Custom_SUAM_Docuware d
where not exists(
select * from dbo.Custom_SUAM_Content c
where c.DOC_ID = d.DOCID
);
you can use EXCEPT
SELECT c.[DOC_ID]
FROM [dbo].[Custom_SUAM_Content] c
EXCEPT
SELECT d.[DOC_ID]
FROM [dbo].[Custom_SUAM_Docuware] d ;
that would show all ids from c that are not in d

Don't select rows where column A is duplicated AND any row of column B is a specific value

I'm working on generating a report merging multiple tables. The report requires only showing projects that did not have any document marked 'Not Received' These document markings are listed in a table that lists each document in an individual line. So when merged into my other table it creates multiple rows of the same project. For example the following table
Project Number
ChecklistValue
565
Received
565
Not Received
465
Received
465
Not Applicable
As you can see really only two projects are listed on this table but the desired output is:
Project Number
Other Info
465
etc
I do not need the checklist value on the actual report, so I can use the GROUP BY to combine all the good rows, but where I have an Issue is that would still include project 565 even if I include something like where ChecklistValue <> 'Not Received', 565 needs to be hidden from the report entirely because any row for 565 contains 'Not Received'.
So that's my actual question, how do I exclude all project numbers rows that have any row containing 'Not Received'?
I'm adding the entire query will generalized names below:
SELECT
Project Number
,Name
,Contractor
,ABS(DATEDIFF(day,(ActualDate),(EstDate))) AS DelayPeriod
,S.NoteDate
,S.FinalAppDate
,Status
,S.ONE
,S.TWO
,S.THREE
,S.FOUR
,CH.ChecklistValue
FROM [DB1] A
INNER JOIN [DB2] C ON A.Contractor = C.Contractor
INNER JOIN [DB3] S ON A.AppID = S.AppID
INNER JOIN [DB4] LS ON S.StatusID = LS.StatusID
LEFT OUTER JOIN [DB5] CH ON A.AppID = CH.AppID AND CH.OtherID = 1
WHERE C.TypeID = 4 AND A.YEAR = 2022, AND S.THING = 1 AND
(CH.CheckListValue IS NULL OR A.AppID NOT IN (SELECT * FROM [DB5] WHERE
CheckListValue = 'Not Reveived'))
GROUP BY Project Number,Name,Contractor,ABS(DATEDIFF(day,(ActualDate),(EstDate))) AS DelayPeriod,S.NoteDate,S.FinalAppDate,Status,S.ONE,S.TWO,S.THREE,S.FOUR
The last portion of the WHERE clause was added from a suggestion, but I'm clearly not implementing it correctly as it errors
You can use not in like:
create table test(
num int,
description varchar(20)
);
insert into test(num,description)
values(565,'Received'),
(565,'Not Received'),
(465,'Received'),
(465,'Not Applicable');
select *
from test
where num not in
(
select num -- Only select one column here
from test
where description = 'Not Received'
);
Results:
+-----+---------------+
| num | description |
+-----+---------------+
| 465 | Received |
| 465 | Not Applicable|
+-----+---------------+
db<>fiddle this is on sql-server but works on other dbms as well.
So in your query you should have (in my understanding):
OR A.AppID NOT IN
(
SELECT AppID -- Not select *
FROM [DB5]
WHERE CheckListValue = 'Not Reveived'
)
Other way to do it is with a cte but it is complicated at first glance:
with x as(
select num
from test
where description = 'Not Received'
)
select t.num, t.description
from test t
left join x
on t.num = x.num
where x.num is null
I'm first creating a cte on the num column where the description = not received then I'm selecting all from the test table, and I'm left joining to the cte but I'm only selecting the num column that are not in the cte by using where x.num is null, and this will only return 465.
Now which one is better? I don't know sometimes join would be faster and sometimes in, for more you can find on this post.

Return only 1 value from a table that is related to a child table with multiple related records

Table A contains an ID that is related to table B. Table B also contains an ID that is related to table C. I need to figure out how to bring back only one value, based on a hierarchy that is defined outside of the tables.
For Example:
Table A (Devices)
-------------------------
| DeviceID | Device Name |
-------------------------
|___001____| Server1_____|
--------------------------
|___002____| server2_____|
--------------------------
Table B (Translation Table)
-------------------------
| DeviceID | Value ID |
-------------------------
|___001____|____456______|
--------------------------
|___002____|____456______|
--------------------------
|___001____|____789______|
--------------------------
|___002____|____123______|
Table C (Value Table)
-------------------------
|_ValueID__|___Value_____|
-------------------------
|___123____|____LOW______|
--------------------------
|___456____|____MED______|
--------------------------
|___789____|____HIGH_____|
--------------------------
What I need is to evaluate each ID from table a and if it has a related value for HIGH (789) I need to bring back HIGH, if it DOESN'T if a related HIGH value then I need to check and see if it is related to a MED value. If the device is not related to a HIGH value but it is related to a MED value, then bring back MED. Lastly doing the same thing for LOW. Devices that don't have a value do not need to be returned.
Desired Output
------------------------------
|___Device Name___|___COST___|
------------------------------
|___Server1_______|___HIGH___|
------------------------------
|___Server2_______|___MED____|
------------------------------
How would i right a query for this information, especially if it is possible for the value ID's to change.
select a.*,
coalesce(
(select C.value from C, B
where c.Value = 'HIGH'
and b.ValueID = c.ValueID
and b.DeviceID = a.DeviceID),
(select C.value from C, B
where c.Value = 'MED'
and b.ValueID = c.ValueID
and b.DeviceID = a.DeviceID),
(select C.value from C, B
where c.Value = 'LOW'
and b.ValueID = c.ValueID
and b.DeviceID = a.DeviceID)
) Value
from a
where exists (select null from C,B
where b.ValueID = c.ValueID
and b.DeviceID = a.DeviceID)
The coalesce will return the first not null value, which probably satisfies your desires.
Based on your inputs and desired output, following query may give desired result:
select a.DeviceName, c.Value
from TableA a inner join TableB b
on a.DeviceID=b.DeviceID
inner Join TableC c
on b.ValueID=c.ValueID

T-SQL cursor or if or case when

I have this table:
Table_NAME_A:
quotid itration QStatus
--------------------------------
5329 1 Assigned
5329 2 Inreview
5329 3 sold
4329 1 sold
4329 2 sold
3214 1 assigned
3214 2 Inreview
Result output should look like this:
quotid itration QStatus
------------------------------
5329 3 sold
4329 2 sold
3214 2 Inreview
T-SQL query, so basically I want the data within "sold" status if not there then "inreview" if not there then "assigned" and also at the same time if "sold" or "inreview" or "assigned" has multiple iteration then i want the highest "iteration".
Please help me, thanks in advance :)
This is a prioritization query. One way to do this is with successive comparisons in a union all:
select a.*
from table_a a
where quote_status = 'sold'
union all
select a.*
from table_a a
where quote_status = 'Inreview' and
not exists (select 1 from table_a a2 where a2.quoteid = a.quoteid and a2.quotestatus = 'sold')
union all
select a.*
from table_a a
where quote_status = 'assigned' and
not exists (select 1
from table_a a2
where a2.quoteid = a.quoteid and a2.quotestatus in ('sold', 'Inreview')
);
For performance on a larger set of data, you would want an index on table_a(quoteid, quotestatus).
You want neither cursors nor if/then for this. Instead, you'll use a series of self-joins to get these results. I'll also use a CTE to simplify getting the max iteration at each step:
with StatusIterations As
(
SELECT quotID, MAX(itration) Iteration, QStatus
FROM table_NAME_A
GROUP BY quotID, QStats
)
select q.quotID, coalesce(sold.Iteration,rev.Iteration,asngd.Iteration) Iteration,
coalesce(sold.QStatus, rev.QStatus, asngd.QStatus) QStatus
from
--initial pass for list of quotes, to ensure every quote is included in the results
(select distinct quotID from table_NAME_A) q
--one additional pass for each possible status
left join StatusIterations sold on sold.quotID = q.quotID and sold.QStatus = 'sold'
left join StatusIterations rev on rev.quotID = q.quotID and rev.QStatus = 'Inreview'
left join StatusIterations asngd on asngd.quotID = q.quotID and asngd.QStatus = 'assigned'
If you have a table that equates a status with a numeric value, you can further improve on this:
Table: Status
QStatus Sequence
'Sold' 3
'Inreview' 2
'Assigned' 1
And the code becomes:
select t.quotID, MAX(t.itration) itration, t.QStatus
from
(
select t.quotID, MAX(s.Sequence) As Sequence
from table_NAME_A t
inner join Status s on s.QStatus = t.QStatus
group by t.quotID
) seq
inner join Status s on s.Sequence = seq.Sequence
inner join table_NAME_A t on t.quotID = seq.quotID and t.QStatus = s.QStatus
group by t.quoteID, t.QStatus
The above may look like complicated at first, but it can be faster and it will scale easily beyond three statuses without changing the code.

SQL merging two tables and updating referenced IDs

I have two tables that I want to join into one table and use a TypeID to differentiate them. Let's say the types are A and B. The Tables are A_Level and B_Level
A's Table looks like
Level
Level_ID Description
B's Table looks like
Level
Level_ID Level_Desc
A's Level_ID is referenced from Table C as Level_ID
B's Level_ID is referenced from Table D as Level_ID
I am looking for a script that would merge the two tables into one table (Level_Code) and update the referenced Tables ID's accordingly.
Any help is greatly appreciated.
select a.Level_Id ALevelId, b.Level_Id BLevelId,
case ISNULL(a.Level_Id, 0) when 0 then 'B' else 'A' end AS Type,
case ISNULL(a.Level_Id, 0) when 0 then b.Level_Id else a.Level_Id end AS NewLevel_Id
INTO Dummy
FROM a
FULL JOIN b On (a.Level_ID = b.Level_ID);
UPDATE c
SET c.Level_id = Dummy.NewLevel_Id
from Dummy, c
WHERE c.Level_Id = Dummy.ALevelId
AND Dummy.Type = 'A';
UPDATE d
SET d.Level_id = Dummy.NewLevel_Id
from Dummy, d
WHERE d.Level_Id = Dummy.BLevelId
AND Dummy.Type = 'B';
SELECT Dummy.NewLevel_Id, a.Level, a.LevelDesc As Description
INTO YourNewTable
from Dummy JOIN a ON (Dummy.ALevelId = a.Level_Id)
Where Dummy.Type = 'A'
UNION
SELECT NewLevel_Id, Level, LevelDesc As Description
from Dummy JOIN b ON (Dummy.BLevelId = b.Level_Id)
Where Dummy.Type = 'B'
DROP TAble Dummy;