How to select lowest value for each subject? - sql

subject_ID Date Test_id value
------- --------- ----- -----
1 1/1/2000 A 50
1 1/1/2000 B 10
1 1/2/2000 A 55
1 1/2/2000 B 09
2 1/1/2000 A 51
2 1/1/2000 B 13
2 1/2/2000 A 48
2 1/2/2000 B 08
Hi All,
I have a question about the scenario above. As you see I have test results that comes daily for each subjects. I'm trying to find a way to select the lowest value for each test in defined period of time so final table will be like this
subject_ID Date Test_id value
------- --------- ----- -----
1 1/1/2000 A 50
1 1/2/2000 B 09
2 1/2/2000 A 48
2 1/2/2000 B 08

I'm not sure what technology you are using, but assuming SQL something using a GROUP BY will work.
SELECT subject_ID
, Date
, Test_id
, MIN(value)
FROM YourTable
GROUP BY subject_ID
, Date
, Test_id

ANSI standard SQL supports the row_number() function. With this function, you ca do:
select t.*
from (select t.*,
row_number() over (partition by subject_id, test_id order by value asc) as seqnum
from t
) t
where seqnum = 1;

Related

analytical functions

good afternoon, a question, how can I optimize the code, I don't know, maybe using oracle analytical functions :
-- tabledeuda : this table contains 2 months 202212 and 202211
SELECT B.*,
NVL(B.DEUDAPRESTAMO_PAGPER,0)-NVL(A.DEUDAPRESTAMO_PAGPER,0) AS SALE_CT -- current month - previous month
FROM tabledeuda B
LEFT JOIN tabledeuda A ON (A.CODLLAVE = B.CODLLAVE
AND A.CODMES = TO_NUMBER(TO_CHAR(ADD_MONTHS(TO_DATE(B.CODMES,'YYYYMM'),-1),'YYYYMM'))
AND A.financial_company = B.financial_company
AND A.CODMONEY=B.CODMONEY)
WHERE NVL(B.DEUDAPRESTAMO_PAGPER,0)>NVL(A.DEUDAPRESTAMO_PAGPER,0)
AND B.CODMES = &CODMES; ---> &CODMES 202212
OUTPUT
Looks like a candidate for lag analytic function.
Sample data is rather poor so it is unclear what happens when there's more data, but - that's the general idea.
Sample data:
SQL> with test (codmes, customer, deudaprestamo_pagper) as
2 (select 202212, 'T1009', 200 from dual union all
3 select 202211, 'T1009', 150 from dual
4 )
Query:
5 select codmes, customer,
6 deudaprestamo_pagper,
7 deudaprestamo_pagper -
8 lag(deudaprestamo_pagper) over (partition by customer order by codmes) sale_ct
9 from test;
CODMES CUSTO DEUDAPRESTAMO_PAGPER SALE_CT
---------- ----- -------------------- ----------
202211 T1009 150
202212 T1009 200 50
SQL>
If you want to fetch only the last row (sorted by codmes), you could e.g.
6 with temp as
7 (select codmes, customer,
8 deudaprestamo_pagper,
9 deudaprestamo_pagper -
10 lag(deudaprestamo_pagper) over (partition by customer order by codmes) sale_ct,
11 --
12 row_number() over (partition by customer order by codmes desc) rn
13 from test
14 )
15 select codmes, customer, deudaprestamo_pagper, sale_ct
16 from temp
17 where rn = 1;
CODMES CUSTO DEUDAPRESTAMO_PAGPER SALE_CT
---------- ----- -------------------- ----------
202212 T1009 200 50
SQL>

Row_Number Sybase SQL Anywhere change on multiple condition

I have a selection that returns
EMP DOC DATE
1 78 01/01
1 96 02/01
1 96 02/01
1 105 07/01
2 4 04/01
2 7 04/01
3 45 07/01
3 45 07/01
3 67 09/01
And i want to add a row number (il'l use it as a primary id) but i want it to change always when the "EMP" changes, and also won't change when the doc is same as previous one like:
EMP DOC DATE ID
1 78 01/01 1
1 96 02/01 2
1 96 02/01 2
1 105 07/01 3
2 4 04/01 1
2 7 04/01 2
3 45 07/01 1
3 45 07/01 1
3 67 09/01 2
In SQL Server I could use LAG to compare previous DOC but I can't seem to find a way into SYBASE SQL Anywhere, I'm using ROW_NUMBER to partitions by the "EMP", but it's not what I need.
SELECT EMP, DOC, DATE, ROW_NUMBER() OVER (PARTITION BY EMP ORDER BY EMP, DOC, DATE) ID -- <== THIS WILL CHANGE THE ROW NUMBER ON SAME DOC ON SAME EMP, SO WOULD NOT WORK.
Anyone have a direction for this?
You sem to want dense_rank():
select
emp,
doc,
date,
dense_rank() over(partition by emp order by date) id
from mytable
This numbers rows within groups having the same emp, and increments only when date changes, without gaps.
if performance is not a issue in your case, you can try sth. like:
SELECT tx.EMP, tx.DOC, tx.DATE, y.ID
FROM table_xxx tx
join y on tx.EMP = y.EMP and tx.DOC = y.DOC
(SELECT EMP, DOC, ROW_NUMBER() OVER (PARTITION BY EMP ORDER BY DOC) ID
FROM(SELECT EMP, DOC FROM table_xxx GROUP BY EMP, DOC)x)y

how to get the unique max data

I have below table
userid comp_dd coursecode qualification course_id course passyear totalmarks func stream
----------- ----------- ----------- --------------- ----------- -------------------------------------------------- ----------- -----------------
60 26 1 High School 15 Class 10 26 67 2 All Subject
60 26 2 Senior Secondry 15 Class 12 26 85 2 Commerce
60 2010 3 Graduates 4 B.Tech/B.E 2010 54 1 IT/Computers
60 2013 4 Post Graduates 9 M.com 2013 98.5 2 Commerce
i wanted to get the unique record of max coursecode, the output should be
userid comp_dd coursecode qualification course_id course passyear totalmarks func stream
----------- ----------- ----------- --------------- ----------- -------------------------------------------------- ----------- -----------------
60 2013 4 Post Graduates 9 M.com 2013 98.5 2 Commerce
There will be many records of the different userids
I think you want:
select t.*
from t
where t.coursecode = (select max(t2.coursecode)
from t t2
where t2.userid = t.userid
);
You can also do this with window functions, but the correlated subquery is often faster with the right index, which would be on (userid, coursecode).
Sort the table by coursecode descending and get the 1st row:
select top 1 *
from tablename
order by coursecode desc
This will work only if there are no duplicates in the column coursecode since it will fetch only 1 row, but I guess this is the case because you say:
i wanted to get the unique record of max coursecode

How to join multiple rows by continue from and to id columns in oracle

I have a scenario where I need to find the start date and end date from multiple rows which are tied by continued_from and continued_to date fields in Oracle.
result should look like
ID STARTDATE ENDDATE
-- ---------- ----------
3 01/01/1000 12/31/9999
ID STARTDATE ENDDATE CONT_FROM_ID CONT_TO_ID
-- ---------- ---------- ------------ -----------
1 01/01/1000 10/10/1999 NULL 2
2 10/10/1999 11/11/2000 1 3
3 11/11/2000 12/31/9999 2 NULL
Oracle's hierarchical query syntax makes it easy to walk the tree from parent to child. The analytical lead() and lag() functions track the next and previous IDs.
select c23.id
, c23.startdate
, c23.enddate
, lag(c23.id) over (partition by p23.id order by c23.id) as cont_from_id
, lead(c23.id) over (partition by p23.id order by c23.id) as cont_to_id
from p23
join c23 on p23.startdate <= c23.startdate
and p23.enddate >= c23.enddate
order by c23.id
/
Here is a test using your sample data:
SQL> select c23.id
2 , c23.startdate
3 , c23.enddate
4 , lag(c23.id) over (partition by p23.id order by c23.id) as cont_from_id
5 , lead(c23.id) over (partition by p23.id order by c23.id) as cont_to_id
6 from p23
7 join c23 on p23.startdate <= c23.startdate
8 and p23.enddate >= c23.enddate
9 order by c23.id
10 /
ID STARTDATE ENDDATE CONT_FROM_ID CONT_TO_ID
---------- --------- --------- ------------ ----------
1 01-JAN-00 10-OCT-99 2
2 10-OCT-99 11-NOV-00 1 3
3 11-NOV-00 31-DEC-99 2
SQL>

SQL: Identify distinct blocks of treatment over multiple start and end date ranges for each member

Objective: Identify distinct episodes of continuous treatment for each member in a table. Each member has a diagnosis and a service date, and an episode is defined as all services where the time between each consecutive service is less than some number (let's say 90 days for this example). The query will need to loop through each row and calculate the difference between dates, and return the first and last date associated with each episode. The goal is to group results by member and episode start/end date.
A very similar question has been asked before, and was somewhat helpful. The problem is that in customizing the code, the returned tables are excluding first and last records. I'm not sure how to proceed.
My data currently looks like this:
MemberCode Diagnosis ServiceDate
1001 ----- ABC ----- 2010-02-04
1001 ----- ABC ----- 2010-03-20
1001 ----- ABC ----- 2010-04-18
1001 ----- ABC ----- 2010-05-22
1001 ----- ABC ----- 2010-09-26
1001 ----- ABC ----- 2010-10-11
1001 ----- ABC ----- 2010-10-19
2002 ----- XYZ ----- 2010-07-10
2002 ----- XYZ ----- 2010-07-21
2002 ----- XYZ ----- 2010-11-08
2002 ----- ABC ----- 2010-06-03
2002 ----- ABC ----- 2010-08-13
In the above data, the first record for Member 1001 is 2010-02-04, and there is not a difference of more than 90 days between consecutive services until 2010-09-26 (the date at which a new episode starts). So Member 1001 has two distinct episodes: (1) Diagnosis ABC, which goes from 2010-02-04 to 2010-05-22, and (2) Diagnosis ABC, which goes from 2010-09-26 to 2010-10-19.
Similarly, Member 2002 has three distinct episodes: (1) Diagnosis XYZ, which goes from 2010-07-10 to 2010-07-21, (2) Diagnosis XYZ, which begins and ends on 2010-11-08, and (3) Diagnosis ABC, which goes from 2010-06-03 to 2010-08-13.
Desired output:
MemberCode Diagnosis EpisodeStartDate EpisodeEndDate
1001 ----- ABC ----- 2010-02-04 ----- 2010-05-22
1001 ----- ABC ----- 2010-09-26 ----- 2010-10-19
2002 ----- XYZ ----- 2010-07-10 ----- 2010-07-21
2002 ----- XYZ ----- 2010-11-08 ----- 2010-11-08
2002 ----- ABC ----- 2010-06-03 ----- 2010-08-13
I've been working on this query for too long, and still can't get exactly what I need. Any help would be appreciated. Thanks in advance!
SQL Server 2012 has the lag() and cumulative sum functions, which makes it easier to write such a query. The idea is to find the first in each sequence. Then take the cumulative sum of the first flag to identify each group. Here is the code:
select MemberId, Diagnosis, min(ServiceDate) as EpisodeStartDate,
max(ServiceStartDate) as EpisodeEndDate
from (select t.*, sum(ServiceStartFlag) over (partition by MemberId, Diagnosis order by ServiceDate) as grp
from (select t.*,
(case when datediff(day,
lag(ServiceDate) over (partition by MemberId, Diagnosis
order by ServiceDate),
ServiceDate) < 90
then 0
else 1 -- handles both NULL and >= 90
end) as ServiceStartFlag
from table t
) t
group by grp, MemberId, Diagnosis;
You can do this in earlier versions of SQL Server but the code is more cumbersome.
For versions of SQL Server prior to 2012, here's some code snippets that should work.
First, you'll need a temp table (as opposed to a CTE, as the lookup of the edge event will fire the newid() function again, rather than retriving the value for that row)
DECLARE #Edges TABLE (MemberCode INT, Diagnosis VARCHAR(3), ServiceDate DATE, GroupID VARCHAR(40))
INSERT INTO #Edges
SELECT *
FROM Treatments E
CROSS APPLY (
SELECT
CASE
WHEN EXISTS (
SELECT TOP 1 E2.ServiceDate
FROM Treatments E2
WHERE E.MemberCode = E2.MemberCode
AND E.Diagnosis = E2.Diagnosis
AND E.ServiceDate > E2.ServiceDate
AND DATEDIFF(dd,E2.ServiceDate,E.ServiceDate) BETWEEN 1 AND 90
ORDER BY E2.ServiceDate DESC
) THEN 'Group'
ELSE CAST(NEWID() AS VARCHAR(40))
END AS GroupID
) z
The EXISTS operator contains a query that looks into the past for a date between 1 and 90 days ago. Once the Edge cases are gathered, this query will provide the results you posted as desired from the test data you posted.
SELECT MemberCode, Diagnosis, MIN(ServiceDate) AS StartDate, MAX(ServiceDate) AS EndDate
FROM (
SELECT
MemberCode
, Diagnosis
, ServiceDate
, CASE GroupID
WHEN 'Group' THEN (
SELECT TOP 1 GroupID
FROM #Edges E2
WHERE E.MemberCode = E2.MemberCode
AND E.Diagnosis = E2.Diagnosis
AND E.ServiceDate > E2.ServiceDate
AND GroupID != 'Group'
ORDER BY ServiceDate DESC
)
ELSE GroupID END AS GroupID
FROM #Edges E
) Z
GROUP BY MemberCode, Diagnosis, GroupID
ORDER BY MemberCode, Diagnosis, MIN(ServiceDate)
Like Gordon said, more cumbersome, but it can be done if your server is not SQL 2012 or greater.