SQL date calculations - sql

need some help doing SQL date calculations:
In a table I have patients who are older than 18 and died from a certain disease (table a). In another table I have Patients of the same disease and the earliest date they were diagnosed with this disease (table b).
What i need to know is if 12 months has passed since they were diagnosed and when they died.
Can someone assist me in performing this date calculation.
The column in table a for date is indexdate and column is deathdate in table b for when they died.
Appreciate any help
Table A:
patientid--age--deathdate
1 20 11/05/2016
2 19 10/09/2015
Table B:
PatientID--indexdate
1 01/02/2015
2 08/03/2014
So essentially all i want to check is if 12 months has passed between indexdate and deathdate.

This gives list of patients for whom 12 months passed since they were diagnosed and when they died.
SELECT A.patientID, A.patientName
FROM tableA A
INNER JOIN
(
SELECT patientID , MIN(DiagnoseDate) As EarliestDate
FROM tableB
GROUP BY patientID
) As B
ON A.patientID = B.patientID
WHERE date_part('month',age(EarliestDate, DeathDate)) >=12

You should be able to do that by writing a query that links the 2 tables by the patient id, then using the dateadd function in the where clause, which would be something like this example:
WHERE TableA.deathdate > (DATEADD(month, 12, TableB.indexdate))

Related

The nearest row in the other table

One table is a sample of users and their purchases.
Structure:
Email | NAME | TRAN_DATETIME (Varchar)
So we have customer email + FirstName&LastName + Date of transaction
and the second table that comes from second system contains all users, they sensitive data and when they got registered in our system.
Simplified Structure:
Email | InstertDate (varchar)
My task is to count minutes difference between the rows insterted from sale(first table)and the rows with users and their sensitive data.
The issue is that second table contain many rows and I want to find the nearest in time row that was inserted in 2nd table, because sometimes it may be a few minutes difeerence(delay or opposite of delay)and sometimes it can be a few days.
So for x email I have row in 1st table:
E_MAIL NAME TRAN_DATETIME
p****#****.eu xxx xxx 2021-10-04 00:03:09.0000000
But then I have 3 rows and the lastest is the one I want to count difference
Email InstertDate
p****#****.eu 2021-05-20 19:12:07
p****#****.eu 2021-05-20 19:18:48
p****#****.eu 2021-10-03 18:32:30 <--
I wrote that some query, but I have no idea how to match nearest row in the 2nd table
SELECT DISTINCT TOP (100)
,a.[E_MAIL]
,a.[NAME]
,a.[TRAN_DATETIME]
,CASE WHEN b.EMAIL IS NOT NULL THEN 'YES' ELSE 'NO' END AS 'EXISTS'
,(ABS(CONVERT(INT, CONVERT(Datetime,LEFT(a.[TRAN_DATETIME],10),120))) - CONVERT(INT, CONVERT(Datetime,LEFT(b.[INSERTDATE],10),120))) as 'DateAccuracy'
FROM [crm].[SalesSampleTable] a
left join [crm].[SensitiveTable] b on a.[E_MAIL]) = b.[EMAIL]
Totally untested: I'd need sample data and database the area of suspect is the casting of dates and the datemath.... since I dont' know what RDBMS and version this is.. consider the following "pseudo code".
We assign a row number to the absolute difference in seconds between the dates those with rowID of 1 win.
WTIH CTE AS (
SELECT A.*, B.* row_number() over (PARTITION BY A.e_mail
ORDER BY abs(datediff(second, cast(Tran_dateTime as Datetime), cast(InsterDate as DateTime)) desc) RN
FROM [crm].[SalesSampleTable] a
LEFT JOIN [crm].[SensitiveTable] b
on a.[E_MAIL] = b.[EMAIL])
SELECT * FROM CTE WHERE RN = 1

Finding a min() date for one column and then using this to join with other tables that have a date LESS than this date

In short, I have two tables:
(1) pharmacy_claims (columns: user_id, date_service, claim_id, record_id, prescription)
(2) medical_claims (columns: user_id, date_service, provider, npi, cost)
I want to find user_id's in (1) that have a certain prescription value, find their earliest date_service (e.g. min(date_service)) and then use these user_id's with their earliest date of service as a cohort to pull all of their associated data from (2). Basically I want to find all of their medical_claims data PRIOR to the first time they were prescribed a given prescription in pharmacy_claims.
pharmacy_claims looks something like this:
user_id | prescription | date_service
1 a 2018-05-01
1 a 2018-02-11
1 a 2019-10-11
1 b 2018-07-12
2 a 2019-01-02
2 a 2019-03-10
2 c 2018-04-11
3 c 2019-05-26
So for instance, if I was interested in prescription = 'a', I would only want user_id 1 and 2 returned, with dates 2018-02-11 and 2019-01-02, respectively. Then I would want to pull user_id 1 and 2 from the medical_claims, and get all of their data PRIOR to these respective dates.
The way I tried to go about this was to build out a temp table in the pharmacy_claims table to query the user_id's that have a given medication, and then left join this back to the table to create a cohort of user_id's with a date_service
Here's what I did:
(1) Pulled all of the relevant data from the main pharmacy claims table:
CREATE TABLE user.temp_pharmacy_claims AS
SELECT user_id, claim_id, record_id, date_service
FROM dw.pharmacyclaims
WHERE date_service between '2018-01-01' and '2019-08-31'
This results in ~50,000 user_id's
(2) Created a table with just the user_id's a min(date_service):
CREATE TABLE user.temp_pharmacy_claims_index AS
SELECT distinct user_id, min(date_service) AS Min_Date
FROM user.temp_pharmacy_claims
GROUP BY 1
(3) Created a final table (to get the desired cohort):
CREATE TABLE user.temp_pharmacy_claims_final_index AS
SELECT a.userid
FROM user.temp_pharmacy_claims a
LEFT JOIN user.temp_pharmacy_claims_index b
ON a.user = b.user
WHERE a.date_service < Min_Date
However, this gets me 0 results when there should be a few thousand. Is this set up correctly? It's probably not the most efficient approach, but it looks sound to me, so not sure what's going on.
I think you just want a correlated subquery:
select mc.*
from medical_claims mc
where mc.date_service < (select min(pc.date)
from pharmacy_claims pc
where pc.user_id = mc.user_id and
pc.prescription = ?
);

Aggregate column text where dates in table a are between dates in table b

Sample data
CREATE TEMP TABLE a AS
SELECT id, adate::date, name
FROM ( VALUES
(1,'1/1/1900','test'),
(1,'3/1/1900','testing'),
(1,'4/1/1900','testinganother'),
(1,'6/1/1900','superbtest'),
(2,'1/1/1900','thebesttest'),
(2,'3/1/1900','suchtest'),
(2,'4/1/1900','test2'),
(2,'6/1/1900','test3'),
(2,'7/1/1900','test4')
) AS t(id,adate,name);
CREATE TEMP TABLE b AS
SELECT id, bdate::date, score
FROM ( VALUES
(1,'12/31/1899', 7 ),
(1,'4/1/1900' , 45),
(2,'12/31/1899', 19),
(2,'5/1/1900' , 29),
(2,'8/1/1900' , 14)
) AS t(id,bdate,score);
What I want
What I need to do is aggregate column text from table a where the id matches table b and the date from table a is between the two closest dates from table b. Desired output:
id date score textagg
1 12/31/1899 7 test, testing
1 4/1/1900 45 testinganother, superbtest
2 12/31/1899 19 thebesttest, suchtest, test2
2 5/1/1900 29 test3, test4
2 8/1/1900 14
My thoughts are to do something like this:
create table date_join
select a.id, string_agg(a.text, ','), b.*
from tablea a
left join tableb b
on a.id = b.id
*having a.date between b.date and b.date*;
but I am really struggling with the last line, figuring out how to aggregate only where the date in table b is between the closest two dates in table b. Any guidance is much appreciated.
I can't promise it's the best way to do it, but this is a way to do it.
with b_values as (
select
id, date as from_date, score,
lead (date, 1, '3000-01-01')
over (partition by id order by date) - 1 as thru_date
from b
)
select
bv.id, bv.from_date, bv.score,
string_agg (a.text, ',')
from
b_values as bv
left join a on
a.id = bv.id and
a.date between bv.from_date and bv.thru_date
group by
bv.id, bv.from_date, bv.score
order by
bv.id, bv.from_date
I'm presupposing you will never have a date in your table greater than 12/31/2999, so if you're still running this query after that date, please accept my apologies.
Here is the output I got when I ran this:
id from_date score string_agg
1 0 7 test,testing
1 92 45 testinganother,superbtest
2 0 19 thebesttest,suchtest,test2
2 122 29 test3,test4
2 214 14
I might also note that between in a join is a performance killer. IF you have large data volumes, there might be better ideas on how to approach this, but that depends largely on what your actual data looks like.

how to perform date calculations from different tables?

Please forgive me if this is a basic question, I'm a beginner in SQL and need some help performing date calculations from 2 tables in SQL.
I have two tables (patient and chd) they look like this:
Patient:
ID|Age|date |Alive
--------------------------
1 50 01/09/2013 Y
2 52 11/05/2015 N
3 19 20/07/2016 N
CHD:
ID|Age|indexdate
--------------------
1 50 01/08/2012
2 52 11/11/2013
3 19 10/07/2015
The patient table contains about 500,000 records from 2010-2016 and the CHD table contains about 350,000 records from 2012-2013. What I want to do is see how many CHD patients have died from 2012-2016, and if they have died has 12months passed?
I'm not sure how to do this but I know a join is needed on the ID and we set the where condition with alive as NOT 'Y'
The final output should look like this based on the sample above:
ID|Age|indexdate| deathdate
---------------------------
2 52 11/11/2013 11/05/2015
3 19 10/07/2016 20/07/2016
Any questions let me know!
EDIT: just to make it clear, patients can appear multiple times in the patient table until they die.
Thanks
Let me assume that this query gets the date of death from the patient table:
select p.id, min(p.date) as deathdate
from patient p
where p.Alive = 'N'
group by p.id;
Then, you can get what you want with a join:
select count(*)
from chd c join
(select p.id, min(p.date) as deathdate
from patient p
where p.Alive = 'N'
group by p.id
) pd
on c.id = pd.id;
You can then address your questions with a where clause in the outer query. For instance:
where deathdate >= current_date - interval '1 year'

An aggregation is affecting results in a major way

I seem to be getting duplicates as a result of this query. The only analysis I want to do is the sum of calls/the total orders, and to be able to see how many support_tickets were generated from orders within an order range, up to a call_date. Very simple, but surprisingly complex to code up. Here is my attempt. I have also tried to change the below into a union, but still get wrong aggregate results.
The query:
SELECT marketing_code,
count(order_code) order_code_count,
order_date,
sum(support_ticket_call) call_count,
call_date
FROM
(select distinct marketing_code, order_code, order_date from table1) a
left join
(select count(call_ids) as support_ticket_Call, call_date
FROM table2 group by call_date) b
on b.order_ID_code = a.order_id_code
group by marketing_code, order_date, call_date
Please note, the call can happen at a much later date than the order. The order date is in table 1, but not in table 2; the call_date is in table 2, but not in table 1. Also, in the data, the marketing code is either AB16 or AB17.
Sample data:
Marketing code order_code_count call_count call_date order_date
AB16 30 45 2016-01-01 2015-12-27
AB17 13 17 2016-01-02 2015-12-29
AB16 24 29 2016-01-02 2016-01-01
The sum of support ticket calls should be lower than the order count.
You join your tables by order_id_code, but in the right part of your join you count all calls from one day. This doesn't seem right. Try something like this:
select
marketing_code
count(order_code) order_code_count
order_date
count(call_ids) call_count
call_date
from
table1 a left join table2 b on b.order_ID_code = a.order_id_code
group by
marketing_code, order_date, call_date