How to do self join on min/max - sql

I am new to sql queries.
Table is defined as
( symbol varchar,
high int,
low int,
today date,
Primary key (symbol, today)
)
I need to find for each symbol in a given date range, max(high) and min(low) and corresponding dates for max(high) and min(low).
Okay to get first max date and min date in given table.
In a given date range some dates may be missing. If start date is not present then next date should be used and if last date is not present then earlier available date should be used
Data is for one year and around 5000 symbols.
I tried something like this
SELECT a.symbol,
a.maxValue,
a.maxdate,
b.minValue,
b.mindate
FROM (
SELECT table1.symbol, max_a.maxValue, max_a.maxdate
FROM table1
INNER JOIN (
SELECT table1.symbol,
max(table1.high) AS maxValue,
table1.TODAY AS maxdate
FROM table1
GROUP BY table1.symbol
) AS max_a
ON max_a.symbol = table1.symbol
AND table1.today = max_a.maxdate
) AS a
INNER JOIN (
SELECT symbol,
min_b.minValue,
min_b.mindate
FROM table1
INNER JOIN (
SELECT symbol,
min(low) AS minValue,
table1.TODAY AS mindate
FROM table1
GROUP BY testnsebav.symbol
) AS min_b
ON min_b.symbol = table1.symbol
AND table1.today = min_b.mindate
) AS b
ON a.symbol = b.symbol

The first INNER query pre-qualifies for each symbol what the low and high values are within the date range provided. After that, it joins back to the original table again (for same date range criteria), but also adds the qualifier that EITHER the low OR the high matches the MIN() or MAX() from the PreQuery. If so, allows it in the result set.
Now, the result columns. Not knowing which version SQL you were using, I have the first 3 columns as the "Final" values. The following 3 columns after that come from the record that qualified by EITHER of the qualifiers. As stocks go up and down all the time, its possible for the high and/or low values to occur more than once within the same time period. This will include ALL those entries that qualify the MIN() / MAX() criteria.
select
PreQuery.Symbol,
PreQuery.LowForSymbol,
PreQuery.HighForSymbol,
tFinal.Today as DateOfMatch,
tFinal.Low as DateMatchLow,
tFinal.High as DateMatchHigh
from
( select
t1.symbol,
min( t1.low ) as LowForSymbol,
max( t1.high ) as HighForSymbol
from
table1 t1
where
t1.today between YourFromDateParameter and YourToDateParameter
group by
t1.symbol ) PreQuery
JOIN table1 tFinal
on PreQuery.Symbol = tFinal.Symbol
AND tFinal.today between YourFromDateParameter and YourToDateParameter
AND ( tFinal.Low = LowForSymbol
OR tFinal.High = HighForSymbol )

Related

Recursive subtraction from two separate tables to fill in historical data

I have two datasets hosted in Snowflake with social media follower counts by day. The main table we will be using going forward (follower_counts) shows follower counts by day:
This table is live as of 4/4/2020 and will be updated daily. Unfortunately, I am unable to get historical data in this format. Instead, I have a table with historical data (follower_gains) that shows net follower gains by day for several accounts:
Ideally - I want to take the follower_count value from the minimum date in the current table (follower_counts) and subtract the sum of gains (organic + paid gains) for each day, until the minimum date of the follower_gains table, to fill in the follower_count historically. In addition, there are several accounts with data in these tables, so it would need to be grouped by account. It should look like this:
I've only gotten as far as unioning these two tables together, but don't even know where to start with looping through these rows:
WITH a AS (
SELECT
account_id,
date,
organizational_entity,
organizational_entity_type,
vanity_name,
localized_name,
localized_website,
organization_type,
total_followers_count,
null AS paid_follower_gain,
null AS organic_follower_gain,
account_name,
last_update
FROM follower_counts
UNION ALL
SELECT
account_id,
date,
organizational_entity,
organizational_entity_type,
vanity_name,
localized_name,
localized_website,
organization_type,
null AS total_followers_count,
organic_follower_gain,
paid_follower_gain,
account_name,
last_update
FROM follower_gains)
SELECT
a.account_id,
a.date,
a.organizational_entity,
a.organizational_entity_type,
a.vanity_name,
a.localized_name,
a.localized_website,
a.organization_type,
a.total_followers_count,
a.organic_follower_gain,
a.paid_follower_gain,
a.account_name,
a.last_update
FROM a
ORDER BY date desc LIMIT 100
UPDATE: Changed union to union all and added not exists to remove duplicates. Made changes per the comments.
NOTE: Please make sure you don't post images of the tables. It's difficult to recreate your scenario to write a correct query. Test this solution and update so that I can make modifications if necessary.
You don't loop through in SQL because its not a procedural language. The operation you define in the query is performed for all the rows in a table.
with cte as (SELECT a.account_id,
a.date,
a.organizational_entity,
a.organizational_entity_type,
a.vanity_name,
a.localized_name,
a.localized_website,
a.organization_type,
(a.follower_count - (b.organic_gain+b.paid_gain)) AS follower_count,
a.account_name,
a.last_update,
b.organic_gain,
b.paid_gain
FROM follower_counts a
JOIN follower_gains b ON a.account_id = b.account_id
AND b.date < (select min(date) from
follower_counts c where a.account.id = c.account_id)
)
SELECT b.account_id,
b.date,
b.organizational_entity,
b.organizational_entity_type,
b.vanity_name,
b.localized_name,
b.localized_website,
b.organization_type,
b.follower_count,
b.account_name,
b.last_update,
b.organic_gain,
b.paid_gain
FROM cte b
UNION ALL
SELECT a.account_id,
a.date,
a.organizational_entity,
a.organizational_entity_type,
a.vanity_name,
a.localized_name,
a.localized_website,
a.organization_type,
a.follower_count,
a.account_name,
a.last_update,
NULL as organic_gain,
NULL as paid_gain
FROM follower_counts a where not exists (select 1 from
follower_gains c where a.account_id = c.account_id AND a.date = c.date)
You could do something like this, instead of using the variable you can just wrap it another bracket and write at end ) AS FollowerGrowth
DECLARE #FollowerGrowth INT =
( SELECT total_followers_count
FROM follower_gains
WHERE AccountID = xx )
-
( SELECT TOP 1 follower_count
FROM follower_counts
WHERE AccountID = xx
ORDER BY date ASCENDING )

Unpivot date columns to a single column of a complex query in Oracle

Hi guys, I am stuck with a stubborn problem which I am unable to solve. Am trying to compile a report wherein all the dates coming from different tables would need to come into a single date field in the report. Ofcourse, the max or the most recent date from all these date columns needs to be added to the single date column for the report. I have multiple users of multiple branches/courses for whom the report would be generated.
There are multiple blogs and the latest date w.r.t to the blogtitle needs to be grouped, i.e. max(date_value) from the six date columns should give the greatest or latest date for that blogtitle.
Expected Result:
select u.batch_uid as ext_person_key, u.user_id, cm.batch_uid as ext_crs_key, cm.crs_id, ir.role_id as
insti_role, (CASE when b.JOURNAL_IND = 'N' then
'BLOG' else 'JOURNAL' end) as item_type, gm.title as item_name, gm.disp_title as ITEM_DISP_NAME, be.blog_pk1 as be_blogPk1, bc.blog_entry_pk1 as bc_blog_entry_pk1,bc.pk1,
b.ENTRY_mod_DATE as b_ENTRY_mod_DATE ,b.CMT_mod_DATE as BlogCmtModDate, be.CMT_mod_DATE as be_cmnt_mod_Date,
b.UPDATE_DATE as BlogUpDate, be.UPDATE_DATE as be_UPDATE_DATE,
bc.creation_date as bc_creation_date,
be.CREATOR_USER_ID as be_CREATOR_USER_ID , bc.creator_user_id as bc_creator_user_id,
b.TITLE as BlogTitle, be.TITLE as be_TITLE,
be.DESCRIPTION as be_DESCRIPTION, bc.DESCRIPTION as bc_DESCRIPTION
FROM users u
INNER JOIN insti_roles ir on u.insti_roles_pk1 = ir.pk1
INNER JOIN crs_users cu ON u.pk1 = cu.users_pk1
INNER JOIN crs_mast cm on cu.crsmast_pk1 = cm.pk1
INNER JOIN blogs b on b.crsmast_pk1 = cm.pk1
INNER JOIN blog_entry be on b.pk1=be.blog_pk1 AND be.creator_user_id = cu.pk1
LEFT JOIN blog_CMT bc on be.pk1=bc.blog_entry_pk1 and bc.CREATOR_USER_ID=cu.pk1
JOIN gradeledger_mast gm ON gm.crsmast_pk1 = cm.pk1 and b.grade_handler = gm.linkId
WHERE cu.ROLE='S' AND BE.STATUS='2' AND B.ALLOW_GRADING='Y' AND u.row_status='0'
AND u.available_ind ='Y' and cm.row_status='0' and and u.batch_uid='userA_157'
I am getting a resultset for the above query with multiple date columns which I want > > to input into a single columnn. The dates have to be the most recent, i.e. max of the dates in the date columns.
I have successfully done the Unpivot by using a view to store the above
resultset and put all the dates in one column. However, I do not
want to use a view or a table to store the resultset and then do
Unipivot simply because I cannot keep creating views for every user
one would query for.
The max(date_value) from the date columns need to be put in one single column. They are as follows:
* 1) b.entry_mod_date, 2) b.cmt_mod_date ,3) be.cmt_mod_date , 4) b.update_Date ,5) be.update_date, 6) bc.creation_date *
Apologies that I could not provide the desc of all the tables and the
fields being used.
Any help to get the above mentioned max of the dates from these
multiple date columns into a single column without using a view or a
table would be greatly appreciated.*
It is not clear what results you want, but the easiest solution is to use greatest().
with t as (
YOURQUERYHERE
)
select t.*,
greatest(entry_mod_date, cmt_mod_date, cmt_mod_date, update_Date,
update_date, bc.creation_date
) as greatestdate
from t;
select <columns>,
case
when greatest (b_ENTRY_mod_DATE) >= greatest (BlogCmtModDate) and greatest(b_ENTRY_mod_DATE) >= greatest(BlogUpDate)
then greatest( b_ENTRY_mod_DATE )
--<same implementation to compare each time BlogCmtModDate and BlogUpDate separately to get the greatest then 'date'>
,<columns>
FROM table
<rest of the query>
UNION ALL
Select <columns>,
case
when greatest (be_cmnt_mod_Date) >= greatest (be_UPDATE_DATE)
then greatest( be_cmnt_mod_Date )
when greatest (be_UPDATE_DATE) >= greatest (be_cmnt_mod_Date)
then greatest( be_UPDATE_DATE )
,<columns>
FROM table
<rest of the query>
UNION ALL
Select <columns>,
GREATEST(bc_creation_date)
,<columns>
FROM table
<rest of the query>

How to pass value to and join correlated query having WITH clause?

I need to pass from one select value to another one to get the calculation done and join the results but having problem with this "with" clause sub query.
This one calculates balance from supplied number loan_id from the second query
WITH my_select (balance, type) As
(
select sum(amount_o-amount_h) as balance, type
from decret d
where d.idnumber = **loan_id**
and d.DATE_D <= '2013-10-31'
and type in (1,2,3,4,11,12,13,18,20,25)
group by type
having sum(amount_o-amount_h) <> 0
)
select sum(balance) FROM my_select //just returns client balance
And this one selects clients data and I want for every client add the balance also to the result calculated in first query.
SELECT *, /*balance*/ from clients
where **loan_id** in (select LoanNum
from NumsFromEx)
How to join them together? (I have simplified a little bit queries to show it cleaner)
Given the sample queries, the following should work:
WITH cteBalances AS
(
SELECT loan_id, SUM(amount_o-amount_h) AS balance
FROM decret d
WHERE d.DATE_D <= '2013-10-31'
AND type IN (1,2,3,4,11,12,13,18,20,25)
GROUP BY loan_id
)
SELECT c.*, b.balance
FROM clients c
LEFT JOIN cteBalances b ON b.loan_id = c.loan_id
WHERE loan_id IN (SELECT LoanNum
FROM NumsFromEx)

SQL Server adjust each value in a column by another table

I have two tables, TblVal and TblAdj.
In TblVal I have a bunch of values that I need adjusted according to TblAdj for a given TblVal.PersonID and TblVal.Date and then returned in some ViewAdjustedValues. I must apply only those adjustments where TblAdj.Date >= TblVal.Date.
The trouble is that since all the adjustments are either a subtraction or a division, they need to be made in order. Here is the table structure:
TblVal: PersonID, Date, Value
TblAdj: PersonID, Date, SubtractAmount, DivideAmount
I want to return ViewAdjustedValues: PersonID, Date, AdjValue
Can I do this without iterating through TblAdj using a WHILE loop and an IF block to either subtract or divide as necessary? Is there some nested SELECT table magic I can perform that would be faster?
I think you can do it without a loop, but whether you want to or not is another question. A query that I think works is below (SQL Fiddle here). The key ideas are as follows:
Each SubtractAmount has the ultimate effect of subtracting SubtractAmount divided by the product of all later DivideAmounts for the same PersonID. The Date associated with the PersonID isn't relevant to this adjustment (fortunately). The CTE AdjustedAdjustments contains these adjusted SubtractAmount values.
The initial Value for a PersonID gets divided by the product of all DivideAmount values on or after that persons Date.
EXP(SUM(LOG(x))) works as an aggregate product if all values of x are positive. You should constrain your DivideAmount values to assure this, or adjust the code accordingly.
If there are no DivideAmounts, the associated product is NULL and changed to 1. Similarly, NULL sums of adjusted SubtractAmount values are changed to zero. A left join is used to preserve an values that are not subject to any adjustments.
SQL Server 2012 supports an OVER clause for aggregates, which was helpful here to aggregate "all later DivideAmounts."
WITH AdjustedAdjustments AS (
select
PersonID,
Date,
SubtractAmount/
EXP(
SUM(LOG(COALESCE(DivideAmount,1)))
OVER (
PARTITION BY PersonID
ORDER BY Date
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
)
) AS AdjustedSubtract,
DivideAmount
FROM TblAdj
)
SELECT
p.PersonID,
p.Value/COALESCE(EXP(SUM(LOG(COALESCE(DivideAmount,1)))),1)
-COALESCE(SUM(a.AdjustedSubtract),0) AS AmountAdjusted
FROM TblVal AS p
LEFT OUTER JOIN AdjustedAdjustments AS a
ON a.PersonID = p.PersonID
AND a.Date >= p.Date
GROUP BY p.PersonID, p.Value, p.Date;
Try something like following:
with CTE_TblVal (PersonID,Date,Value)
as
(
select A.PersonID, A.Date, A.Value
from TblVal A
inner join TblAdj B
on A.PersonID = B.PersonID
where B.Date >= A.Date
)
update CTE_TblVal
set Date = TblAdj.Date,
Value = TblAdj.Value
from CTE_TblVal
inner join TblAdj
on CTE_Tblval.PersonID = TblAdj.PersonID
output inserted.* into ViewAdjustedValues
select * from ViewAdjustedValues

SQL Server 2008: Using Multiple dts Ranges to Build a Set of Dates

I'm trying to build a query for a medical database that counts the number of patients that were on at least one medication from a class of medications (the medications listed below in the FAST_MEDS CTE) and had either:
1) A diagnosis of myopathy (the list of diagnoses in the FAST_DX CTE)
2) A CPK lab value above 1000 (the lab value in the FAST_LABS CTE)
and this diagnosis or lab happened AFTER a patient was on a statin.
The query I've included below does that under the assumption that once a patient is on a statin, they're on a statin forever. The first CTE collects the ids of patients that were on a statin along with the first date of their diagnosis, the second those with a diagnosis, and the third those with a high lab value. After this I count those that match the above criteria.
What I would like to do is drop the assumption that once a patient is on a statin, they're on it for life. The table edw_dm.patient_medications has a column called start_dts and end_dts. This table has one row for each prescription written, with start_dts and end_dts denoting the start and end date of the prescription. End_dts could be null, which I'll take to assume that the patient is currently on this medication (it could be a missing record, but I can't do anything about this). If a patient is on two different statins, the start and ends dates can overlap, and there may be multiple records of the same medication for a patient, as in a record showing 3-11-2000 to 4-5-2003 and another for the same patient showing 5-6-2007 to 7-8-2009.
I would like to use these two columns to build a query where I'm only counting the patients that had a lab value or diagnosis done during a time when they were already on a statin, or in the first n (say 3) months after they stopped taking a statin. I'm really not sure how to go about rewriting the first CTE to get this information and how to do the comparison after the CTEs are built. I know this is a vague question, but I'm really stumped. Any ideas?
As always, thank you in advance.
Here's the current query:
WITH FAST_MEDS AS
(
select distinct
statins.mrd_pt_id, min(year(statins.order_dts)) as statin_yr
from
edw_dm.patient_medications as statins
inner join mrd.medications as mrd
on statins.mrd_med_id = mrd.mrd_med_id
WHERE mrd.generic_nm in (
'Lovastatin (9664708500)',
'lovastatin-niacin',
'Lovastatin/Niacin',
'Lovastatin',
'Simvastatin (9678583966)',
'ezetimibe-simvastatin',
'niacin-simvastatin',
'ezetimibe/Simvastatin',
'Niacin/Simvastatin',
'Simvastatin',
'Aspirin Buffered-Pravastatin',
'aspirin-pravastatin',
'Aspirin/Pravastatin',
'Pravastatin',
'amlodipine-atorvastatin',
'Amlodipine/atorvastatin',
'atorvastatin',
'fluvastatin',
'rosuvastatin'
)
and YEAR(statins.order_dts) IS NOT NULL
and statins.mrd_pt_id IS NOT NULL
group by statins.mrd_pt_id
)
select *
into #meds
from FAST_MEDS
;
--return patients who had a diagnosis in the list and the year that
--diagnosis was given
with
FAST_DX AS
(
SELECT pd.mrd_pt_id, YEAR(pd.init_noted_dts) as init_yr
FROM edw_dm.patient_diagnoses as pd
inner join mrd.diagnoses as mrd
on pd.mrd_dx_id = mrd.mrd_dx_id
and mrd.icd9_cd in
('728.89','729.1','710.4','728.3','729.0','728.81','781.0','791.3')
)
select *
into #dx
from FAST_DX;
--return patients who had a high cpk value along with the year the cpk
--value was taken
with
FAST_LABS AS
(
SELECT
pl.mrd_pt_id, YEAR(pl.order_dts) as lab_yr
FROM
edw_dm.patient_labs as pl
inner join mrd.labs as mrd
on pl.mrd_lab_id = mrd.mrd_lab_id
and mrd.lab_nm = 'CK (CPK)'
WHERE
pl.lab_val between 1000 AND 999998
)
select *
into #labs
from FAST_LABS;
-- count the number of patients who had a lab value or a medication
-- value taken sometime AFTER their initial statin diagnosis
select
count(distinct p.mrd_pt_id) as ct
from
mrd.patient_demographics as p
join #meds as m
on p.mrd_pt_id = m.mrd_pt_id
AND
(
EXISTS (
SELECT 'A' FROM #labs l WHERE p.mrd_pt_id = l.mrd_pt_id
and l.lab_yr >= m.statin_yr
)
OR
EXISTS(
SELECT 'A' FROM #dx d WHERE p.mrd_pt_id = d.mrd_pt_id
AND d.init_yr >= m.statin_yr
)
)
You probably don't need to select all of your CTE defined queries into temp tables.
I think that the query you're after has the form:
WITH FAST_MEDS(PatientID, StartDate, EndDate) AS
(
--your query for patients on statins, projecting the patient ID and the start/end date for the medication
),
FAST_DX(PatientID, Date) AS
(
--your query for patients with certain diagnosis, projecting the patient ID and the date
),
FAST_LABS(PatientID, Date) AS
(
--your query for patients with certain labs, projecting the patient ID and the date
)
SELECT PatientID
FROM FAST_MEDS
WHERE PatientID IN (SELECT PatientID FROM FAST_DX WHERE Date BETWEEN StartDate AND EndDate OR EndDate IS NULL AND StartDate < Date)
OR PatientID IN (SELECT PatientID FROM FAST_LABS WHERE Date BETWEEN StartDate AND EndDate OR EndDate IS NULL AND StartDate < Date)