if more than 1 match, do not return 'unknown' - sql

I composed a monster query. I'm certain that it can be optimized, and I would more than appreciate any comments/guidance on the query itself; however, I have a specific question:
The data I am returning is sometimes duplicated on multiple columns:
+-------+------+----------+------+-------+--------+----------+-------+------+
| first | last | deaID | cert | count | npi | clientid | month | year |
+-------+------+----------+------+-------+--------+----------+-------+------+
| Alex | Jue | UNKNOWN | MD | 11 | 123123 | 102889 | 7 | 2012 |
| Alex | Jue | BJ123123 | MD | 11 | 123123 | 102889 | 7 | 2012 |
+-------+------+----------+------+-------+--------+----------+-------+------+
as you can see all of the fields are equal except for deaID
in this case, I would like to only return:
+------+-----+----------+----+----+--------+--------+---+------+
| | | | | | | | | |
+------+-----+----------+----+----+--------+--------+---+------+
| Alex | Jue | BJ123123 | MD | 11 | 123123 | 102889 | 7 | 2012 |
+------+-----+----------+----+----+--------+--------+---+------+
however, if there are no duplicates:
+-------+------+---------+------+-------+--------+----------+-------+------+
| first | last | deaID | cert | count | npi | clientid | month | year |
+-------+------+---------+------+-------+--------+----------+-------+------+
| Alex | Jue | UNKNOWN | MD | 11 | 123123 | 102889 | 7 | 2012 |
+-------+------+---------+------+-------+--------+----------+-------+------+
then i would like to keep it!
summary
if there are duplicates remove all records with 'deaID=unknown'; however, if there is only 1 match then return that match
question
how do i return unknown records IFF there is 1 match?
here is the monster query in case anybody is interested :)
with ctebiggie as (
select distinct
p.[IMS_PRESCRIBER_ID],
p.PHYSICIAN_NPI as MLISNPI,
a.CLIENT_ID,
p.MLIS_FIRSTNAME,
p.MLIS_LASTNAME,
p_address.IMS_DEA_NBR,
p.IMS_PROFESSIONAL_ID_NBR,
p.IMS_PROFESSIONAL_ID_NBR_src,
p.IMS_CERTIFICATION_CODE,
datepart(mm,a.RECEIVED_DATE) as [Month],
datepart(yyyy,a.RECEIVED_DATE) as [Year]
from
MILLENNIUM_DW_dev..D_PHYSICIAN p
left outer join
MILLENNIUM_DW_dev..F_ACCESSION_DAILY a
on a.REQUESTOR_NPI=p.PHYSICIAN_NPI
left outer join MILLENNIUM_DW_dev..D_PHYSICIAN_ADDRESS p_address
on p.PHYSICIAN_NPI=p_address.PHYSICIAN_NPI
where
a.RECEIVED_DATE is not null
--and p.IMS_PRESCRIBER_ID is not null
--and p_address.IMS_DEA_NBR !='UNKNOWN'
and p.REC_ACTIVE_FLG=1
and p_address.REC_ACTIVE_FLG=1
and DATEPART(yyyy,received_date)=2012
and DATEPART(mm,received_date)=7
group by
p.[IMS_PRESCRIBER_ID],
p.PHYSICIAN_NPI,
p.IMS_PROFESSIONAL_ID_NBR,
p.MLIS_FIRSTNAME,
p.MLIS_LASTNAME,
p_address.IMS_DEA_NBR,
p.IMS_PROFESSIONAL_ID_NBR,
p.IMS_PROFESSIONAL_ID_NBR_src,
p.IMS_CERTIFICATION_CODE,
datepart(mm,a.RECEIVED_DATE),
datepart(yyyy,a.RECEIVED_DATE),
a.CLIENT_ID
)
,
ctecount as
(select
COUNT (Distinct f.ACCESSION_ID) [count],
f.REQUESTOR_NPI,f.CLIENT_ID,
datepart(mm,f.RECEIVED_DATE) mm,
datepart(yyyy,f.RECEIVED_DATE)yyyy
from MILLENNIUM_DW_dev..F_ACCESSION_DAILY f
where
f.CLIENT_ID not in (select * from SalesDWH..TestPractices)
and DATEPART(yyyy,f.received_date)=2012
and DATEPART(mm,f.received_date)=7
group by f.REQUESTOR_NPI,
f.CLIENT_ID,
datepart(mm,f.RECEIVED_DATE),
datepart(yyyy,f.RECEIVED_DATE)
)
select ctebiggie.*,c.* from
ctebiggie
full outer join
ctecount c
on c.REQUESTOR_NPI=ctebiggie.MLISNPI
and c.mm=ctebiggie.[Month]
and c.yyyy=ctebiggie.[Year]
and c.CLIENT_ID=ctebiggie.CLIENT_ID

Assuming you have the base query, I will assign row_number and count by partition function over this resultset. Then on the outer select, if count is 1 then unknown is selected, else it is not selected.
SELECT first,
last,
deaID,
cert,
count,
npi,
clientid,
month,
year
FROM (
SELECT first,
last,
deaID,
cert,
count,
npi,
clientid,
month,
year,
ROW_NUMBER() OVER (PARTITION BY
first,last,cert,count,npi,clientid,month,year
ORDER BY CASE WHEN deaID = 'Unkown' THEN 0 ELSE 1 END,
deaID) AS RowNumberInGroup,
COUNT() OVER (PARTITION BY first,last,cert,count,npi,clientid,month,year)
AS CountPerGroup,
SUM(CASE WHEN deaID = 'Unkown' THEN 1 ELSE 0 END)
OVER (PARTITION BY first,last,cert,count,npi,clientid,month,year)
AS UnknownCountPerGroup
FROM BaseQuery
) T
WHERE (T.CountPerGroup = T.UnknownCountPerGroup AND T.RowNumberInGroup = 1) OR T.RowNumberInGroup > T.UnknownCountPerGroup

see this helps or not
select distinct main.col1,main.col2 ,
isnull(( select col3 from table1 where table1.col1=main.col1
and table1.col2=main.col2 and col3 <>'UNKNOWN'),'UNKNOWN')
from table1 main
Sample in Sql fiddle
or fair version of yours will be
SELECT distinct first,
last,
cert,
count,
npi,
clientid,
month,
year,
isnull(
select top 1 dealid from table1 intable where
intable.first=maintable.first and
intable.last=maintable.last and
intable.cert=maintable.cert and
intable.npi=maintable.npi and
intable.clientid=outtable.clientid and
intable.month=outtable.month and
intable.year=outtable.year
where dealid<>'UNKNOWN'),'UNKNOWN') as dealId
FROM table1 maintable

Related

how to get individual-clinic-month that are excluded from SQL query

I have the following dataset:
individual | clinic_1 | clinic_2 | month | address_recorded | address_code
1 | A | B | 01-01-2016 | 01-02-1999 | C01
1 | A | A | 01-01-2016 | 01-02-2003 | C02
1 | A | A | 01-01-2016 | 01-02-2001 | C06
1 | A | X | 01-01-2016 | 01-02-2000 | C03
2 | C | B | 01-04-2016 | 01-02-1999 | D04
2 | C | A | 01-04-2016 | 01-02-2001 | D05
2 | C | X | 01-04-2016 | 01-02-2000 | D06
I would like to get:
individual | clinic_1 | month | address_code
1 | A | 01-01-2016 | C02
2 | C | 01-04-2016 | D05
Criteria:
For unique set of individual-clinic_1-month with clinic_1 = clinic_2, select the most
recent date in which address was recorded within clinic_1
For unique set of individual-clinic_1-month with NO instances where clinic_1 = clinic_2,
select the most recent date in which address was recorded across
clinics
I thought about doing:
with cte_1
as
(
select * from table
where clinic_1 = clinic_2
)
,cte_2
as
(
select row_number () over (Partition by clinic_1, individual, month order by clinic_1, individual, month, address_recorded desc) as number, *
from cte_1
)
select individual, clinic_1, month, address_code from cte_2 where number = 1
But I don't know how to get those individual-clinic_1-month for which there are no instances where clinic_1=clinic_2, any ideas?
You can Union two select queries; one to select all records where clinic_1=clinic_2 and another one to select all records where clinic_1<>clinic_2
and clinic_1 not in the results set of the first query.
Both queries are grouped by [individual],[clinic_1], [clinic_2], [mnth] to find all of the required data rows for each [clinic_1] - [mnth] entry. Noting that for the 2nd query [clinic_2] is selected as ''.
Check the following:
with cte as
(SELECT [individual] ,[clinic_1],[clinic_2],[mnth],max([address_recorded]) as m
FROM [MyData] where [clinic_1]=[clinic_2]
group by [individual],[clinic_1],[clinic_2] ,[mnth]
),
cte2 as
(SELECT [MyData].[individual] ,[MyData].[clinic_1],'' as [clinic_2],[MyData].[mnth],max([MyData].[address_recorded]) as m
FROM [MyData]
Left Join cte on cte.individual=MyData.individual
and cte.mnth=MyData.mnth
where [MyData].[clinic_1]<>[MyData].[clinic_2] and cte.individual IS NULL
group by [MyData].[individual],[MyData].[clinic_1], [MyData].[mnth]
),
D as
(SELECT * FROM cte
UNION
SELECT * FROM cte2)
,
LastQr as(
select [MyData].individual, [MyData].clinic_1,[MyData].mnth,[MyData].address_code,
row_number() OVER(PARTITION BY [MyData].individual, [MyData].clinic_1,[MyData].mnth ORDER BY [MyData].individual, [MyData].clinic_1,[MyData].mnth)
as rn from D
INNER JOIN [MyData]
ON D.individual=MyData.individual and D.clinic_1=MyData.clinic_1 and D.mnth=MyData.mnth and D.m=MyData.address_recorded
and (D.clinic_2=MyData.clinic_2 or D.clinic_2='')
)
select * from LastQr where rn=1
See the results from dbfiddle.uk.

Get some values from the table by selecting

I have a table:
| id | Number |Address
| -----| ------------|-----------
| 1 | 0 | NULL
| 1 | 1 | NULL
| 1 | 2 | 50
| 1 | 3 | NULL
| 2 | 0 | 10
| 3 | 1 | 30
| 3 | 2 | 20
| 3 | 3 | 20
| 4 | 0 | 75
| 4 | 1 | 22
| 4 | 2 | 30
| 5 | 0 | NULL
I need to get: the NUMBER of the last ADDRESS change for each ID.
I wrote this select:
select dh.id, dh.number from table dh where dh =
(select max(min(t.history)) from table t where t.id = dh.id group by t.address)
But this select not correctly handling the case when the address first changed, and then changed to the previous value. For example id=1: group by return:
| Number |
| -------- |
| NULL |
| 50 |
I have been thinking about this select for several days, and I will be happy to receive any help.
You can do this using row_number() -- twice:
select t.id, min(number)
from (select t.*,
row_number() over (partition by id order by number desc) as seqnum1,
row_number() over (partition by id, address order by number desc) as seqnum2
from t
) t
where seqnum1 = seqnum2
group by id;
What this does is enumerate the rows by number in descending order:
Once per id.
Once per id and address.
These values are the same only when the value is 1, which is the most recent address in the data. Then aggregation pulls back the earliest row in this group.
I answered my question myself, if anyone needs it, my solution:
select * from table dh1 where dh1.number = (
select max(x.number)
from (
select
dh2.id, dh2.number, dh2.address, lag(dh2.address) over(order by dh2.number asc) as prev
from table dh2 where dh1.id=dh2.id
) x
where NVL(x.address, 0) <> NVL(x.prev, 0)
);

SQL group by a field and only return one joined row for each grouping

Table data
+-----+----------------+--------+----------------+
| ID | Required_by | Name | Another_Field |
+-----+----------------+--------+----------------+
| 1 | 7 August | cat | X |
| 2 | 7 August | cat | Y |
| 3 | 10 August | cat | Z |
| 4 | 11 August | dog | A |
+-----+----------------+--------+----------------+
What I want to do is group by the name, then for each group choose one of the rows with the earliest required by date.
For this data set, I would like to end up with either rows 1 and 4, or rows 2 and 4.
Expected result:
+-----+----------------+--------+----------------+
| ID | Required_by | Name | Another_Field |
+-----+----------------+--------+----------------+
| 1 | 7 August | cat | X |
| 4 | 11 August | dog | A |
+-----+----------------+--------+----------------+
OR
+-----+----------------+--------+----------------+
| ID | Required_by | Name | Another_Field |
+-----+----------------+--------+----------------+
| 2 | 7 August | cat | Y |
| 4 | 11 August | dog | A |
+-----+----------------+--------+----------------+
I have something that returns 1,2 and 4 but I'm not sure how to only pick one from the first group to get the desired result. I'm joining the grouping with the data table so that I can get the ID and another_field back after the grouping.
SELECT d.id, d.name, d.required_by, d.another_field
FROM
(
SELECT min(required_by) as min_date, name
FROM data
GROUP BY name
) agg
INNER JOIN
data d
on d.required_by = agg.min_date AND d.name = agg.name
This is typically solved using window functions:
select d.id, d.name, d.required_by, d.another_field
from (
select id, name, required_by, another_field,
row_number() over (partition by name order by required_by) as rn
from data
) d
where d.rn = 1;
In Postgres using distinct on() is typically faster:
select distinct on (name) *
from data
order by name, required_by
Online example
SELECT [id]
,[date]
,[name]
FROM [test].[dbo].[data]
WHERE date IN (SELECT min(date) FROM data GROUP BY name)
enter image description here

get the value from the previous row if row is NULL

I have this pivoted table
+---------+----------+----------+-----+----------+
| Date | Product1 | Product2 | ... | ProductN |
+---------+----------+----------+-----+----------+
| 7/1/15 | 5 | 2 | ... | 7 |
| 8/1/15 | 7 | 1 | ... | 9 |
| 9/1/15 | NULL | 7 | ... | NULL |
| 10/1/15 | 8 | NULL | ... | NULL |
| 11/1/15 | NULL | NULL | ... | NULL |
+---------+----------+----------+-----+----------+
I wanted to fill in the NULL column with the values above them. So, the output should be something like this.
+---------+----------+----------+-----+----------+
| Date | Product1 | Product2 | ... | ProductN |
+---------+----------+----------+-----+----------+
| 7/1/15 | 5 | 2 | ... | 7 |
| 8/1/15 | 7 | 1 | ... | 9 |
| 9/1/15 | 7 | 7 | ... | 9 |
| 10/1/15 | 8 | 7 | ... | 9 |
| 11/1/15 | 8 | 7 | ... | 9 |
+---------+----------+----------+-----+----------+
I've found this article that might help me but this only manipulate one column. How do I apply this to all my column or how can I achieve such result since my columns are dynamic.
Any help would be much appreciated. Thanks!
The ANSI standard has the IGNORE NULLS option on LAG(). This is exactly what you want. Alas, SQL Server has not (yet?) implemented this feature.
So, you can do this in several ways. One is using multiple outer applys. Another uses correlated subqueries:
select p.date,
(case when p.product1 is not null else p.product1
else (select top 1 p2.product1 from pivoted p2 where p2.date < p.date order by p2.date desc)
end) as product1,
(case when p.product1 is not null else p.product1
else (select top 1 p2.product1 from pivoted p2 where p2.date < p.date order by p2.date desc)
end) as product1,
(case when p.product2 is not null else p.product2
else (select top 1 p2.product2 from pivoted p2 where p2.date < p.date order by p2.date desc)
end) as product2,
. . .
from pivoted p ;
I would recommend an index on date for this query.
I would like to suggest you a solution. If you have a table which consists of merely two columns my solution will work perfectly.
+---------+----------+
| Date | Product |
+---------+----------+
| 7/1/15 | 5 |
| 8/1/15 | 7 |
| 9/1/15 | NULL |
| 10/1/15 | 8 |
| 11/1/15 | NULL |
+---------+----------+
select x.[Date],
case
when x.[Product] is null
then min(c.[Product])
else
x.[Product]
end as Product
from
(
-- this subquery evaluates a minimum distance to the rows where Product column contains a value
select [Date],
[Product],
min(case when delta >= 0 then delta else null end) delta_min,
max(case when delta < 0 then delta else null end) delta_max
from
(
-- this subquery maps Product table to itself and evaluates the difference between the dates
select p.[Date],
p.[Product],
DATEDIFF(dd, p.[Date], pnn.[Date]) delta
from #products p
cross join (select * from #products where [Product] is not null) pnn
) x
group by [Date], [Product]
) x
left join #products c on x.[Date] =
case
when abs(delta_min) < abs(delta_max) then DATEADD(dd, -delta_min, c.[Date])
else DATEADD(dd, -delta_max, c.[Date])
end
group by x.[Date], x.[Product]
order by x.[Date]
In this query I mapped the table to itself rows which contain values by CROSS JOIN statement. Then I calculated differences between dates in order to pick the closest ones and thereafter fill empty cells with values.
Result:
+---------+----------+
| Date | Product |
+---------+----------+
| 7/1/15 | 5 |
| 8/1/15 | 7 |
| 9/1/15 | 7 |
| 10/1/15 | 8 |
| 11/1/15 | 8 |
+---------+----------+
Actually, the suggested query doesn't choose the previous value. Instead of this, it selects the closest value. In other words, my code can be used for a number of different purposes.
First You need to add identity column in temporary or hard table then resolved by following method.
--- Solution ----
Create Table #Test (ID Int Identity (1,1),[Date] Date , Product_1 INT )
Insert Into #Test ([Date], Product_1)
Values
('7/1/15',5)
,('8/1/15',7)
,('9/1/15',Null)
,('10/1/15',8)
,('11/1/15',Null)
Select ID , DATE ,
IIF ( Product_1 is null ,
(Select Product_1 from #TEST
Where ID = (Select Top 1 a.ID From #TEST a where a.Product_1 is not null and a.ID<b.ID
Order By a.ID desc)
),Product_1) Product_1
from #Test b
-- Solution End ---

Rolling up remaining rows into one called "Other"

I have written a query which selects lets say 10 rows for this example.
+-----------+------------+
| STORENAME | COMPLAINTS |
+-----------+------------+
| Store1 | 4 |
| Store7 | 2 |
| Store8 | 1 |
| Store9 | 1 |
| Store2 | 1 |
| Store3 | 1 |
| Store4 | 1 |
| Store5 | 0 |
| Store6 | 0 |
| Store10 | 0 |
+-----------+------------+
How would I go about displaying the TOP 3 rows BUT Having the remaining rows roll up into a row called "other", and it adds all of their Complaints together?
So like this for example:
+-----------+------------+
| STORENAME | COMPLAINTS |
+-----------+------------+
| Store1 | 4 |
| Store7 | 2 |
| Store8 | 1 |
| Other | 4 |
+-----------+------------+
So what has happened above, is it displays the top3 then adds the complaints of the remaining rows into a row called other
I have exhausted all my resources and cannot find a solution. Please let me know if this makes sense.
I have created a SQLfiddle of the above tables that you can edit if it is possible :)
Here's hoping this is possible :)
Thanks,
Mike
Something like this may work
select *, row_number() over (order by complaints desc) as sno
into #temp
from
(
SELECT
a.StoreName
,COUNT(b.StoreID) AS [Complaints]
FROM Stores a
LEFT JOIN
(
SELECT
StoreName
,Complaint
,StoreID
FROM Complaints
WHERE Complaint = 'yes') b on b.StoreID = a.StoreID
GROUP BY a.StoreName
) as t ORDER BY [Complaints] DESC
select storename,complaints from #temp where sno<4
union all
select 'other',sum(complaints) as complaints from #temp where sno>=4
I do this with double aggregation and row_number():
select (case when seqnum <= 3 then storename else 'Other' end) as StoreName,
sum(numcomplaints) as numcomplaints
from (select c.storename, count(*) as numcomplaints,
row_number() over (order by count(*) desc) as seqnum
from complaints c
where c.complaint = 'Yes'
group by c.storename
) s
group by (case when seqnum <= 3 then storename else 'Other' end) ;
From what I can see, you don't really need any additional information from stores, so this version just leaves that table out.