I have the following dataset:
individual | clinic_1 | clinic_2 | month | address_recorded | address_code
1 | A | B | 01-01-2016 | 01-02-1999 | C01
1 | A | A | 01-01-2016 | 01-02-2003 | C02
1 | A | A | 01-01-2016 | 01-02-2001 | C06
1 | A | X | 01-01-2016 | 01-02-2000 | C03
2 | C | B | 01-04-2016 | 01-02-1999 | D04
2 | C | A | 01-04-2016 | 01-02-2001 | D05
2 | C | X | 01-04-2016 | 01-02-2000 | D06
I would like to get:
individual | clinic_1 | month | address_code
1 | A | 01-01-2016 | C02
2 | C | 01-04-2016 | D05
Criteria:
For unique set of individual-clinic_1-month with clinic_1 = clinic_2, select the most
recent date in which address was recorded within clinic_1
For unique set of individual-clinic_1-month with NO instances where clinic_1 = clinic_2,
select the most recent date in which address was recorded across
clinics
I thought about doing:
with cte_1
as
(
select * from table
where clinic_1 = clinic_2
)
,cte_2
as
(
select row_number () over (Partition by clinic_1, individual, month order by clinic_1, individual, month, address_recorded desc) as number, *
from cte_1
)
select individual, clinic_1, month, address_code from cte_2 where number = 1
But I don't know how to get those individual-clinic_1-month for which there are no instances where clinic_1=clinic_2, any ideas?
You can Union two select queries; one to select all records where clinic_1=clinic_2 and another one to select all records where clinic_1<>clinic_2
and clinic_1 not in the results set of the first query.
Both queries are grouped by [individual],[clinic_1], [clinic_2], [mnth] to find all of the required data rows for each [clinic_1] - [mnth] entry. Noting that for the 2nd query [clinic_2] is selected as ''.
Check the following:
with cte as
(SELECT [individual] ,[clinic_1],[clinic_2],[mnth],max([address_recorded]) as m
FROM [MyData] where [clinic_1]=[clinic_2]
group by [individual],[clinic_1],[clinic_2] ,[mnth]
),
cte2 as
(SELECT [MyData].[individual] ,[MyData].[clinic_1],'' as [clinic_2],[MyData].[mnth],max([MyData].[address_recorded]) as m
FROM [MyData]
Left Join cte on cte.individual=MyData.individual
and cte.mnth=MyData.mnth
where [MyData].[clinic_1]<>[MyData].[clinic_2] and cte.individual IS NULL
group by [MyData].[individual],[MyData].[clinic_1], [MyData].[mnth]
),
D as
(SELECT * FROM cte
UNION
SELECT * FROM cte2)
,
LastQr as(
select [MyData].individual, [MyData].clinic_1,[MyData].mnth,[MyData].address_code,
row_number() OVER(PARTITION BY [MyData].individual, [MyData].clinic_1,[MyData].mnth ORDER BY [MyData].individual, [MyData].clinic_1,[MyData].mnth)
as rn from D
INNER JOIN [MyData]
ON D.individual=MyData.individual and D.clinic_1=MyData.clinic_1 and D.mnth=MyData.mnth and D.m=MyData.address_recorded
and (D.clinic_2=MyData.clinic_2 or D.clinic_2='')
)
select * from LastQr where rn=1
See the results from dbfiddle.uk.
Related
This question already has answers here:
Join to only the "latest" record with t-sql
(7 answers)
Fetch the rows which have the Max value for a column for each distinct value of another column
(35 answers)
Closed 4 months ago.
I want to list all customer with the latest phone number and most recent customer type
the phone number and type of customers are changing periodically so I want the latest record only without getting old values based on the lastestupdate column
Customer:
+------------+--------------------+------------+
|latestUpdate| CustID | AddID | TypeID |
+------------+--------+-----------+-------------
| 2020-03-01 | 1 | 1 | 1 |
| 2020-04-07 | 2 | 2 | 2 |
| 2020-06-13 | 3 | 3 | 3 |
| 2020-03-29 | 4 | 4 | 4 |
| 2020-02-06 | 5 | 5 | 5 |
+------------+--------+------------+----------+
CustomerAddress:
+------------+--------+-----------+
|latestUpdate| AddID | Mobile |
+------------+--------+-----------+
| 2020-03-01 | 1 | 66666 |
| 2020-04-07 | 1 | 55555 |
| 2020-06-13 | 2 | 99999 |
| 2020-03-29 | 3 | 11111 |
| 2020-02-06 | 3 | 22222 |
+------------+--------+-----------+
CustomerType:
+------------+--------+-----------+
|latestUpdate| TypeId | TypeName |
+------------+--------+-----------+
| 2020-03-01 | 1 | First |
| 2020-04-07 | 1 | Second |
| 2020-06-13 | 3 | Third |
| 2020-03-29 | 4 | Fourth |
| 2020-02-06 | 5 | Fifth |
+------------+--------+-----------+
When I tried to join I am always getting duplicated customerID not only the latest record
I want to Display Customer.CustID and CustomerType.TypeName and CustomerAddress.Mobile
You need to make sub-queries for most recent customer type and latest phone number like this:
SELECT *
FROM (
SELECT latestUpdate, CustID, AddID, TypeID,
ROW_NUMBER() OVER (PARTITION BY CustID ORDER BY latestUpdate DESC) AS RowNumber
FROM Customer
) AS c
INNER JOIN (
SELECT latestUpdate, AddID, Mobile,
ROW_NUMBER() OVER (PARTITION BY AddId ORDER BU ltestUpdate DESC) AS RowNumber
FROM CustomerAddress
) AS t
ON c.AddId = t.AddId
INNER JOIN CustomerType ct
ON ct.TypeId = c.TypeId
WHERE c.RowNumber = 1
AND t.RowNumber = 1
A simpler way than using row_number would be using cross apply together with top 1 in an ordered subquery:
select c.CustId, p.Mobile
from Customer c
cross apply (
select top 1 Mobile
from CustomerAddress a
where c.CustId = a.AddId
order by a.latestUpdate
) p
You need to use some subqueries :
SELECT *
FROM Customer AS C
LETF OUTER JOIN (SELECT *, ROW_NUMBER() OVER(PARTITION BY CustID ORDER BY LastestUpdate DESC) AS N
FROM CustomerAddress) AS A
ON C.CustID = A.CustID AND N = 1
LETF OUTER JOIN (SELECT *, ROW_NUMBER() OVER(PARTITION BY CustID ORDER BY LastestUpdate DESC) AS N
FROM CustomerType) AS T
ON C.CustID = T.CustID AND N = 1
If you have had used Temporal table which is an ISO SQL Standard feature for data history of table, you will always have the lastest rows inside the main table, old rows stays into history table and can be queried with a time point or date interval restriction.
This is it:
select * from (select *,RANK() OVER (
PARTITION BY b.AddID
ORDER BY b.latestUpdate DESC,
) as rank1
from
Customer a
left join
CustomerAddress b
on
a.AddID=b.AddID
left join
CustomerType c
on
v.TypeId =c.TypeId
) where rank1=1;
You should join the tables using the "APPLY" operator.
See: Link
I have a table:
| id | Number |Address
| -----| ------------|-----------
| 1 | 0 | NULL
| 1 | 1 | NULL
| 1 | 2 | 50
| 1 | 3 | NULL
| 2 | 0 | 10
| 3 | 1 | 30
| 3 | 2 | 20
| 3 | 3 | 20
| 4 | 0 | 75
| 4 | 1 | 22
| 4 | 2 | 30
| 5 | 0 | NULL
I need to get: the NUMBER of the last ADDRESS change for each ID.
I wrote this select:
select dh.id, dh.number from table dh where dh =
(select max(min(t.history)) from table t where t.id = dh.id group by t.address)
But this select not correctly handling the case when the address first changed, and then changed to the previous value. For example id=1: group by return:
| Number |
| -------- |
| NULL |
| 50 |
I have been thinking about this select for several days, and I will be happy to receive any help.
You can do this using row_number() -- twice:
select t.id, min(number)
from (select t.*,
row_number() over (partition by id order by number desc) as seqnum1,
row_number() over (partition by id, address order by number desc) as seqnum2
from t
) t
where seqnum1 = seqnum2
group by id;
What this does is enumerate the rows by number in descending order:
Once per id.
Once per id and address.
These values are the same only when the value is 1, which is the most recent address in the data. Then aggregation pulls back the earliest row in this group.
I answered my question myself, if anyone needs it, my solution:
select * from table dh1 where dh1.number = (
select max(x.number)
from (
select
dh2.id, dh2.number, dh2.address, lag(dh2.address) over(order by dh2.number asc) as prev
from table dh2 where dh1.id=dh2.id
) x
where NVL(x.address, 0) <> NVL(x.prev, 0)
);
Table data
+-----+----------------+--------+----------------+
| ID | Required_by | Name | Another_Field |
+-----+----------------+--------+----------------+
| 1 | 7 August | cat | X |
| 2 | 7 August | cat | Y |
| 3 | 10 August | cat | Z |
| 4 | 11 August | dog | A |
+-----+----------------+--------+----------------+
What I want to do is group by the name, then for each group choose one of the rows with the earliest required by date.
For this data set, I would like to end up with either rows 1 and 4, or rows 2 and 4.
Expected result:
+-----+----------------+--------+----------------+
| ID | Required_by | Name | Another_Field |
+-----+----------------+--------+----------------+
| 1 | 7 August | cat | X |
| 4 | 11 August | dog | A |
+-----+----------------+--------+----------------+
OR
+-----+----------------+--------+----------------+
| ID | Required_by | Name | Another_Field |
+-----+----------------+--------+----------------+
| 2 | 7 August | cat | Y |
| 4 | 11 August | dog | A |
+-----+----------------+--------+----------------+
I have something that returns 1,2 and 4 but I'm not sure how to only pick one from the first group to get the desired result. I'm joining the grouping with the data table so that I can get the ID and another_field back after the grouping.
SELECT d.id, d.name, d.required_by, d.another_field
FROM
(
SELECT min(required_by) as min_date, name
FROM data
GROUP BY name
) agg
INNER JOIN
data d
on d.required_by = agg.min_date AND d.name = agg.name
This is typically solved using window functions:
select d.id, d.name, d.required_by, d.another_field
from (
select id, name, required_by, another_field,
row_number() over (partition by name order by required_by) as rn
from data
) d
where d.rn = 1;
In Postgres using distinct on() is typically faster:
select distinct on (name) *
from data
order by name, required_by
Online example
SELECT [id]
,[date]
,[name]
FROM [test].[dbo].[data]
WHERE date IN (SELECT min(date) FROM data GROUP BY name)
enter image description here
I have an issue with an SQL query that I am trying to write. I am trying to retrieve the row that has the minimal create_dt for each inst (see table) and amount (which isn't unique).
Unfortunately I can't use group by as the amount column isn't unique.
+--------------+--------+------+-------------+
| Company_Name | Amount | inst | Create Date |
+--------------+--------+------+-------------+
| Company A | 1000 | 4545 | 01/10/2018 |
| Company A | 400 | 4545 | 01/11/2018 |
| Company A | 200 | 4545 | 31/10/2018 |
| Company B | 2000 | 4893 | 01/10/2016 |
| Company B | 212 | 4893 | 04/10/2016 |
| Company B | 100 | 4893 | 10/10/2017 |
| Company B | 20 | 4893 | 04/10/2018 |
+--------------+--------+------+-------------+
In the above example I expect to see:
+--------------+--------+------+-------------+
| Company_Name | Amount | inst | Create Date |
+--------------+--------+------+-------------+
| Company A | 1000 | 4545 | 01/10/2018 |
| Company B | 2000 | 4893 | 01/10/2016 |
+--------------+--------+------+-------------+
Code:
SELECT
bill_company, bill_name, account_no
FROM
dbo.customer_information;
SELECT
balance_id, balance_id2, minus_balance,new_balance,
create_date, account_no
FROM
dbo.btr
SELECT
balance_id, balance_id2, expired_Date, amount, balance_type, account_no
FROM
dbo.btr_balance
SELECT
balance_ist, expired_date, account_no, balance_type
FROM
dbo.BALANCE_inst
Retrieve the minimal create data for a balance instance with the lowest balance for a balance inst.
(SELECT
bill_company,
bill_name,
account_no,
balance_ist,
amount,
MIN(create_date)
FROM
dbo.mtr btr
LEFT JOIN
btr_balance btrb ON btr.balance_id = btrb.balance_id
AND btr.balance_id2 = btrb.balance_id2
LEFT JOIN
balance_inst bali ON btr.account_no = bali.account_no
AND btrb.expired_date = bali.expired_date
GROUP BY
bill_company, bill_name, account_no,amount, balance_ist)
I have seen some solutions about using correlated query but can't see to get my head around it.
Common Table Expression (CTE) will help you.
;with cte as (
select *, row_number() over(partition by company_name order by create_date) rn
from dbo.myTable
)
select * from cte
where rn = 1;
use row_number() i assumed bill_company is your company name
select * from
( SELECT bill_company,
bill_name,
account_no,
balance_ist,
amount,
create_date,
row_number() over(partition by bill_company order by create_date) rn
FROM dbo.mtr btr left join btr_balance btrb
on btr.balance_id = btrb.balance_id and btr.balance_id2 = btrb.balance_id2
left join balance_inst bali
on btr.account_no = bali.account_no and btrb.expired_date = bali.expired_date
) t where t.rn=1
I composed a monster query. I'm certain that it can be optimized, and I would more than appreciate any comments/guidance on the query itself; however, I have a specific question:
The data I am returning is sometimes duplicated on multiple columns:
+-------+------+----------+------+-------+--------+----------+-------+------+
| first | last | deaID | cert | count | npi | clientid | month | year |
+-------+------+----------+------+-------+--------+----------+-------+------+
| Alex | Jue | UNKNOWN | MD | 11 | 123123 | 102889 | 7 | 2012 |
| Alex | Jue | BJ123123 | MD | 11 | 123123 | 102889 | 7 | 2012 |
+-------+------+----------+------+-------+--------+----------+-------+------+
as you can see all of the fields are equal except for deaID
in this case, I would like to only return:
+------+-----+----------+----+----+--------+--------+---+------+
| | | | | | | | | |
+------+-----+----------+----+----+--------+--------+---+------+
| Alex | Jue | BJ123123 | MD | 11 | 123123 | 102889 | 7 | 2012 |
+------+-----+----------+----+----+--------+--------+---+------+
however, if there are no duplicates:
+-------+------+---------+------+-------+--------+----------+-------+------+
| first | last | deaID | cert | count | npi | clientid | month | year |
+-------+------+---------+------+-------+--------+----------+-------+------+
| Alex | Jue | UNKNOWN | MD | 11 | 123123 | 102889 | 7 | 2012 |
+-------+------+---------+------+-------+--------+----------+-------+------+
then i would like to keep it!
summary
if there are duplicates remove all records with 'deaID=unknown'; however, if there is only 1 match then return that match
question
how do i return unknown records IFF there is 1 match?
here is the monster query in case anybody is interested :)
with ctebiggie as (
select distinct
p.[IMS_PRESCRIBER_ID],
p.PHYSICIAN_NPI as MLISNPI,
a.CLIENT_ID,
p.MLIS_FIRSTNAME,
p.MLIS_LASTNAME,
p_address.IMS_DEA_NBR,
p.IMS_PROFESSIONAL_ID_NBR,
p.IMS_PROFESSIONAL_ID_NBR_src,
p.IMS_CERTIFICATION_CODE,
datepart(mm,a.RECEIVED_DATE) as [Month],
datepart(yyyy,a.RECEIVED_DATE) as [Year]
from
MILLENNIUM_DW_dev..D_PHYSICIAN p
left outer join
MILLENNIUM_DW_dev..F_ACCESSION_DAILY a
on a.REQUESTOR_NPI=p.PHYSICIAN_NPI
left outer join MILLENNIUM_DW_dev..D_PHYSICIAN_ADDRESS p_address
on p.PHYSICIAN_NPI=p_address.PHYSICIAN_NPI
where
a.RECEIVED_DATE is not null
--and p.IMS_PRESCRIBER_ID is not null
--and p_address.IMS_DEA_NBR !='UNKNOWN'
and p.REC_ACTIVE_FLG=1
and p_address.REC_ACTIVE_FLG=1
and DATEPART(yyyy,received_date)=2012
and DATEPART(mm,received_date)=7
group by
p.[IMS_PRESCRIBER_ID],
p.PHYSICIAN_NPI,
p.IMS_PROFESSIONAL_ID_NBR,
p.MLIS_FIRSTNAME,
p.MLIS_LASTNAME,
p_address.IMS_DEA_NBR,
p.IMS_PROFESSIONAL_ID_NBR,
p.IMS_PROFESSIONAL_ID_NBR_src,
p.IMS_CERTIFICATION_CODE,
datepart(mm,a.RECEIVED_DATE),
datepart(yyyy,a.RECEIVED_DATE),
a.CLIENT_ID
)
,
ctecount as
(select
COUNT (Distinct f.ACCESSION_ID) [count],
f.REQUESTOR_NPI,f.CLIENT_ID,
datepart(mm,f.RECEIVED_DATE) mm,
datepart(yyyy,f.RECEIVED_DATE)yyyy
from MILLENNIUM_DW_dev..F_ACCESSION_DAILY f
where
f.CLIENT_ID not in (select * from SalesDWH..TestPractices)
and DATEPART(yyyy,f.received_date)=2012
and DATEPART(mm,f.received_date)=7
group by f.REQUESTOR_NPI,
f.CLIENT_ID,
datepart(mm,f.RECEIVED_DATE),
datepart(yyyy,f.RECEIVED_DATE)
)
select ctebiggie.*,c.* from
ctebiggie
full outer join
ctecount c
on c.REQUESTOR_NPI=ctebiggie.MLISNPI
and c.mm=ctebiggie.[Month]
and c.yyyy=ctebiggie.[Year]
and c.CLIENT_ID=ctebiggie.CLIENT_ID
Assuming you have the base query, I will assign row_number and count by partition function over this resultset. Then on the outer select, if count is 1 then unknown is selected, else it is not selected.
SELECT first,
last,
deaID,
cert,
count,
npi,
clientid,
month,
year
FROM (
SELECT first,
last,
deaID,
cert,
count,
npi,
clientid,
month,
year,
ROW_NUMBER() OVER (PARTITION BY
first,last,cert,count,npi,clientid,month,year
ORDER BY CASE WHEN deaID = 'Unkown' THEN 0 ELSE 1 END,
deaID) AS RowNumberInGroup,
COUNT() OVER (PARTITION BY first,last,cert,count,npi,clientid,month,year)
AS CountPerGroup,
SUM(CASE WHEN deaID = 'Unkown' THEN 1 ELSE 0 END)
OVER (PARTITION BY first,last,cert,count,npi,clientid,month,year)
AS UnknownCountPerGroup
FROM BaseQuery
) T
WHERE (T.CountPerGroup = T.UnknownCountPerGroup AND T.RowNumberInGroup = 1) OR T.RowNumberInGroup > T.UnknownCountPerGroup
see this helps or not
select distinct main.col1,main.col2 ,
isnull(( select col3 from table1 where table1.col1=main.col1
and table1.col2=main.col2 and col3 <>'UNKNOWN'),'UNKNOWN')
from table1 main
Sample in Sql fiddle
or fair version of yours will be
SELECT distinct first,
last,
cert,
count,
npi,
clientid,
month,
year,
isnull(
select top 1 dealid from table1 intable where
intable.first=maintable.first and
intable.last=maintable.last and
intable.cert=maintable.cert and
intable.npi=maintable.npi and
intable.clientid=outtable.clientid and
intable.month=outtable.month and
intable.year=outtable.year
where dealid<>'UNKNOWN'),'UNKNOWN') as dealId
FROM table1 maintable