proc sql join in SAS that is closest to a date - sql

How can I do a one-to-many join between two datasets with proc sql in SAS to obtain the record in Dataset B is closest to a value in Dataset A?
Dataset A
#Patient #Date of Dose
001 2020-02-01
Dataset B
# Patient # Lab Test #Date of Test # Value
001 Test 1 2020-01-17 6
001 Test 1 2020-01-29 10
I want to do the join to select the second record in Dataset B, the record with a "Date of Test" that is the closest (less than or equal to) to the "Date of Dose" in the first dataset.

I want to do the join to select the [..] record in Dataset B [...] with a "Date of Test" that is the closest (less than or equal to) to the "Date of Dose" in the first dataset.
You could use outer appy - if sas supports that:
select a.*, b.*
from a
outer apply(
select top 1 b.*
from b
where b.patient = a.patient and b.date_of_test <= a.date_of_dose
order by b.date_of_test desc
) b
Another solution is to join with a not exists condition:
select a.*, b.*
from a
left join b
on b.patient = a.patient
and b.date_of_test <= a.date_of_dose
and not exists (
select 1
from b b1
where
b1.patient = a.patient
and b1.date_of_test <= a.date_of_dose
and b1.date_of_test > b.date_of_test
)

Calculate the absolute difference between both dates and select the minimum date with a having clause. You'll need to do additional logic, such as distinct, to remove any duplicates.
proc sql noprint;
select t1.patient
, t1.date_of_dose
, abs(t1.date - t2.date) as date_dif
from dataset_A as t1
LEFT JOIN
dataset_B as t2
ON t1.patient = t2.patient
where t1.date <= t2.date
group by t1.patient
having calculated date_dif = min(calculated date_dif)
;
quit;

try this:
SELECT TOP 1 *
FROM (
SELECT DSA.Patient
,DSA.Date_Of_Dose
,DSB.Date_Of_Test
,DATEDIFF(Day, DSA.Date_Of_Dose, DSA.Date_Of_Dose) Diff
FROM DataSetA DSA
JOIN DataSetB DSB ON DSA.Patient = DSB.Patient
) Data
WHERE ABS(Diff) = MIN(ABS(Diff));
Sorry, I got no way to know if this is working 'cause I'm not at home. I hope it helps you.

I want to do the join to select the second record in Dataset B, the record with a "Date of Test" that is the closest (less than or equal to) to the "Date of Dose" in the first dataset.
I would recommend cross-joining Date of Test and Date of Dose, and calculate absolute difference between the dates using intck() function, and leave the minimal value.

Related

How to join 100 random rows from table 1 multiple other tables in oracle

I have scrapped my previous question as I did not do a good job explaining. Maybe this will be simpler.
I have the following query.
Select * from comp_eval_hdr, comp_eval_pi_xref, core_pi, comp_eval_dtl
where comp_eval_hdr.START_DATE between TO_DATE('01-JAN-16' , 'DD-MON-YY')
and TO_DATE('12-DEC-17' , 'DD-MON-YY')
and comp_eval_hdr.COMP_EVAL_ID = comp_eval_dtl.COMP_EVAL_ID
and comp_eval_hdr.COMP_EVAL_ID = comp_eval_pi_xref.COMP_EVAL_ID
and core_pi.PI_ID = comp_eval_pi_xref.PI_ID
and core_pi.PROGRAM_CODE = 'PS'
Now if I only want a random 100 rows from the comp_eval_hdr table to join with the other tables how would I go about it? If it makes it easier you can disregard the comp_eval_dtl table.
I think you are pretty much there. You just need subqueries, table aliases, and JOIN conditions:
SELECT . . .
FROM (SELECT a.*
FROM (SELECT a.*
FROM a
WHERE a.START_DATE BEWTWEEN DATE '2016-01-01' AND DATE '2017-12-12'
ORDER BY DBMS_RANDOM.VALUE
) a
WHERE ROWNUM <= 100
) a JOIN
mapping m
ON a.? = m.? JOIN
b
ON m.? = b.?;
The ? is just a placeholder for the join columns.
It's a bit of a stretch to know what you want with the question as written but here's my attempt.
WITH rand_list AS
(SELECT * FROM comp_eval_hdr
WHERE comp_eval_hdr.START_DATE BEWTWEEN TO_DATE('01-JAN-16' , 'DD-MON-YY') AND TO_DATE('12-DEC-17' , 'DD-MON-YY')
ORDER BY DBMS_RANDOM.VALUE)
first_100 AS
(SELECT *
FROM rand_list
WHERE ROWNUM <=100)
SELECT md.col_1, t3.col_a
FROM first_100 md
INNER JOIN
table2 t2 ON md.id_column = t2.fk_comp_eval_hdr_id
INNER JOIN
table3 t3 ON t3.id_column = t2.fk_table3_id
You haven't given any indication how they join or the table names and obviously I haven't run this against any mock tables.
You've got a list of randomised records with RAND_LIST which you could, if you wanted, combine with the FIRST_100 query (your choice).
The main query then just joins that through your mapping table (T2) to your 'multiples' table (T3).
how does table 2 look like?...Let me put one example as person table and order table?
select * from (
select * from person ps , order order where ps.city = 'mumbai' and ps.id = order.purchasedby ) porder where porder.rownum <= 100
I did not tested it but it will look something like this.

SQL self-join optimization

I am writing a query that requires a self-join of a large table (> 1 million rows)
I'm only interested in the rows that were created today which I can filter using a recording_time column that contains the epoch time.
However, I'm not certain that the below query is actually limiting the tables BEFORE doing the join.
SELECT B.ani
FROM [app].[dbo].[recordings] B
INNER JOIN [app].[dbo].[recordings] A
ON B.callid = A.callid AND B.dnis = A.ani
where A.filename LIKE '%680627.wav'
AND B.recording_time > 1485340000
Filter rows that were created today and use that new table to join.
SELECT B.ani
FROM ( SELECT * FROM [app].[dbo].[recordings] where recording_time > 1485340000 ) B
INNER JOIN ( SELECT * FROM [app].[dbo].[recordings] where recording_time > 1485340000 ) A
ON B.callid = A.callid AND B.dnis = A.ani
where A.filename LIKE '%680627.wav'

Limit join to one row

I have the following query:
SELECT sum((select count(*) as itemCount) * "SalesOrderItems"."price") as amount, 'rma' as
"creditType", "Clients"."company" as "client", "Clients".id as "ClientId", "Rmas".*
FROM "Rmas" JOIN "EsnsRmas" on("EsnsRmas"."RmaId" = "Rmas"."id")
JOIN "Esns" on ("Esns".id = "EsnsRmas"."EsnId")
JOIN "EsnsSalesOrderItems" on("EsnsSalesOrderItems"."EsnId" = "Esns"."id" )
JOIN "SalesOrderItems" on("SalesOrderItems"."id" = "EsnsSalesOrderItems"."SalesOrderItemId")
JOIN "Clients" on("Clients"."id" = "Rmas"."ClientId" )
WHERE "Rmas"."credited"=false AND "Rmas"."verifyStatus" IS NOT null
GROUP BY "Clients".id, "Rmas".id;
The problem is that the table "EsnsSalesOrderItems" can have the same EsnId in different entries. I want to restrict the query to only pull the last entry in "EsnsSalesOrderItems" that has the same "EsnId".
By "last" entry I mean the following:
The one that appears last in the table "EsnsSalesOrderItems". So for example if "EsnsSalesOrderItems" has two entries with "EsnId" = 6 and "createdAt" = '2012-06-19' and '2012-07-19' respectively it should only give me the entry from '2012-07-19'.
SELECT (count(*) * sum(s."price")) AS amount
, 'rma' AS "creditType"
, c."company" AS "client"
, c.id AS "ClientId"
, r.*
FROM "Rmas" r
JOIN "EsnsRmas" er ON er."RmaId" = r."id"
JOIN "Esns" e ON e.id = er."EsnId"
JOIN (
SELECT DISTINCT ON ("EsnId") *
FROM "EsnsSalesOrderItems"
ORDER BY "EsnId", "createdAt" DESC
) es ON es."EsnId" = e."id"
JOIN "SalesOrderItems" s ON s."id" = es."SalesOrderItemId"
JOIN "Clients" c ON c."id" = r."ClientId"
WHERE r."credited" = FALSE
AND r."verifyStatus" IS NOT NULL
GROUP BY c.id, r.id;
Your query in the question has an illegal aggregate over another aggregate:
sum((select count(*) as itemCount) * "SalesOrderItems"."price") as amount
Simplified and converted to legal syntax:
(count(*) * sum(s."price")) AS amount
But do you really want to multiply with the count per group?
I retrieve the the single row per group in "EsnsSalesOrderItems" with DISTINCT ON. Detailed explanation:
Select first row in each GROUP BY group?
I also added table aliases and formatting to make the query easier to parse for human eyes. If you could avoid camel case you could get rid of all the double quotes clouding the view.
Something like:
join (
select "EsnId",
row_number() over (partition by "EsnId" order by "createdAt" desc) as rn
from "EsnsSalesOrderItems"
) t ON t."EsnId" = "Esns"."id" and rn = 1
this will select the latest "EsnId" from "EsnsSalesOrderItems" based on the column creation_date. As you didn't post the structure of your tables, I had to "invent" a column name. You can use any column that allows you to define an order on the rows that suits you.
But remember the concept of the "last row" is only valid if you specifiy an order or the rows. A table as such is not ordered, nor is the result of a query unless you specify an order by
Necromancing because the answers are outdated.
Take advantage of the LATERAL keyword introduced in PG 9.3
left | right | inner JOIN LATERAL
I'll explain with an example:
Assuming you have a table "Contacts".
Now contacts have organisational units.
They can have one OU at a point in time, but N OUs at N points in time.
Now, if you have to query contacts and OU in a time period (not a reporting date, but a date range), you could N-fold increase the record count if you just did a left join.
So, to display the OU, you need to just join the first OU for each contact (where what shall be first is an arbitrary criterion - when taking the last value, for example, that is just another way of saying the first value when sorted by descending date order).
In SQL-server, you would use cross-apply (or rather OUTER APPLY since we need a left join), which will invoke a table-valued function on each row it has to join.
SELECT * FROM T_Contacts
--LEFT JOIN T_MAP_Contacts_Ref_OrganisationalUnit ON MAP_CTCOU_CT_UID = T_Contacts.CT_UID AND MAP_CTCOU_SoftDeleteStatus = 1
--WHERE T_MAP_Contacts_Ref_OrganisationalUnit.MAP_CTCOU_UID IS NULL -- 989
-- CROSS APPLY -- = INNER JOIN
OUTER APPLY -- = LEFT JOIN
(
SELECT TOP 1
--MAP_CTCOU_UID
MAP_CTCOU_CT_UID
,MAP_CTCOU_COU_UID
,MAP_CTCOU_DateFrom
,MAP_CTCOU_DateTo
FROM T_MAP_Contacts_Ref_OrganisationalUnit
WHERE MAP_CTCOU_SoftDeleteStatus = 1
AND MAP_CTCOU_CT_UID = T_Contacts.CT_UID
/*
AND
(
(#in_DateFrom <= T_MAP_Contacts_Ref_OrganisationalUnit.MAP_KTKOE_DateTo)
AND
(#in_DateTo >= T_MAP_Contacts_Ref_OrganisationalUnit.MAP_KTKOE_DateFrom)
)
*/
ORDER BY MAP_CTCOU_DateFrom
) AS FirstOE
In PostgreSQL, starting from version 9.3, you can do that, too - just use the LATERAL keyword to achieve the same:
SELECT * FROM T_Contacts
--LEFT JOIN T_MAP_Contacts_Ref_OrganisationalUnit ON MAP_CTCOU_CT_UID = T_Contacts.CT_UID AND MAP_CTCOU_SoftDeleteStatus = 1
--WHERE T_MAP_Contacts_Ref_OrganisationalUnit.MAP_CTCOU_UID IS NULL -- 989
LEFT JOIN LATERAL
(
SELECT
--MAP_CTCOU_UID
MAP_CTCOU_CT_UID
,MAP_CTCOU_COU_UID
,MAP_CTCOU_DateFrom
,MAP_CTCOU_DateTo
FROM T_MAP_Contacts_Ref_OrganisationalUnit
WHERE MAP_CTCOU_SoftDeleteStatus = 1
AND MAP_CTCOU_CT_UID = T_Contacts.CT_UID
/*
AND
(
(__in_DateFrom <= T_MAP_Contacts_Ref_OrganisationalUnit.MAP_KTKOE_DateTo)
AND
(__in_DateTo >= T_MAP_Contacts_Ref_OrganisationalUnit.MAP_KTKOE_DateFrom)
)
*/
ORDER BY MAP_CTCOU_DateFrom
LIMIT 1
) AS FirstOE
Try using a subquery in your ON clause. An abstract example:
SELECT
*
FROM table1
JOIN table2 ON table2.id = (
SELECT id FROM table2 WHERE table2.table1_id = table1.id LIMIT 1
)
WHERE
...

Compare last to second last record for each contract

To keep it simple, my question is similar to THIS QUESTION, PART 2, only problem is, I am not running Oracle and thus can not use the rownumbers.
For those who need more information and examples:
I have a table
contractId date value
1 09/02/2011 A
1 13/02/2011 C
2 02/02/2011 D
2 08/02/2011 A
2 12/02/2011 C
3 22/01/2011 C
3 30/01/2011 B
3 12/02/2011 D
3 21/01/2011 A
EDIT: added another line for ContractID. Since I had some code myself, but that would display the following:
contractId date value value_old
1 09/02/2011 A
2 08/02/2011 A D
3 30/01/2011 B C
3 30/01/2011 B A
But that is not what I want ! The result should still be as below!
Now I want to select the last record before a given date and compare that with the previous value.
Suppose the 'given date' is 11/02/2011 in this example, the output should be like this:
contractId date value value_old
1 09/02/2011 A
2 08/02/2011 A D
3 30/01/2011 B C
I do have the query to select the last record before the given date. That is the easy part. But to select the last record before that, I am lost...
I really hope I can get some help here, have been breaking my head over this for days and looking for answers on the web and stackoverflow.
One possibility:
SELECT a.contractId, a.Date, a.Value, (SELECT Top 1 b.[Value]
FROM tbl b
WHERE b.[Date] < a.[Date] And b.ContractID=a.ContractID
ORDER BY b.[Date] Desc) AS Old_Value
FROM tbl AS a
WHERE a.Date IN
(SELECT TOP 1 b.Date
FROM tbl b
WHERE b.ContractID=a.ContractID
AND b.Date < #2011/02/11#
ORDER BY b.date DESC)
As promised, I would also post my answer. Although at this point, I still think Remou's answer is better, since the code is shorter and it seems more efficient (calls the same table fewer times). But here goes:
Query1:
SELECT c.contractID, c.firstofdates, a.value, d.value, d.date
FROM (table1 AS A RIGHT JOIN (SELECT b.cid,max(b.date) AS FirstOfdates
FROM table1 as B
where b.date < #02/11/2011#
GROUP BY b.contractID ) AS c ON (a.date = c.firstofdates) AND (a.contractID = c.contractID))
LEFT JOIN (select e.contractID, e.date, e.value
from table1 as e
) AS d ON (d.date < c.firstofdates) AND (d.contractID = c.contractID);
This query actually gives the result with the extra row for the 3rd contractID.
Query2:
SELECT b.contractID, max(a.date) AS olddate
FROM table1 AS a RIGHT JOIN (select contractID, firstofdates
from Query1) AS b ON (a.contractID= b.contractID) AND (a.date < b.firstofdates)
GROUP BY b.contractID;
And then to combine both:
Query3:
SELECT Query1.contractID, Query1.firstofdates AS [date], Query1.A.value AS [value], Query1.d.value AS [old value]
FROM Query1 RIGHT JOIN Query2
ON (Query1.date=Query2.olddate or Query2.olddate is null) AND (Query1.cid = Query2.cid);

Aggregate function in comparison of 2 rows in the same table (SQL)

Given the table definition:
create table mytable (
id integer,
mydate datetime,
myvalue integer )
I want to get the following answer by a single SQL query:
id date_actual value_actual date_previous value_previous
where:
date_previous is the maximum of all the dates preceeding date_actual
for each id and values correspond with the two dates
{max(date_previous) < date_actual ?}
How can I achieve that?
Thanks for your hints
This is a variation of the common "greatest N per group" query which comes up every week on StackOverflow.
SELECT m1.id, m1.mydate AS date_actual, m1.myvalue AS value_actual,
m2.mydate AS date_previous, m2.myvalue AS value_previous
FROM mytable m1
LEFT OUTER JOIN mytable m2
ON (m1.id = m2.id AND m1.mydate > m2.mydate)
LEFT OUTER JOIN mytable m3
ON (m1.id = m3.id AND m1.mydate > m3.mydate AND m3.mydate > m2.mydate)
WHERE m3.id IS NULL;
In other words, m2 is all rows with the same id and a lesser mydate, but we want only the one such that there is no row m3 with a date between m1 and m2. Assuming the dates are unique, there will only be one row in m2 where this is true.
Assuming I understood your requirements correctly, here's something you can try.
select a.id,
a.mydate as date_actual,
a.value as value_actual,
b.date as date_previous,
b.value as value_previous
from mytable a, mytable b
where a.id = b.id and
a.mydate > b.mydate and
b.mydate = (select max(mydate) from mytable c where c.id = a.id and c.mydate < a.mydate)
Apologies for the ugly SQL. I am sure there are better ways of doing this.