Rolling up remaining rows into one called "Other" - sql

I have written a query which selects lets say 10 rows for this example.
+-----------+------------+
| STORENAME | COMPLAINTS |
+-----------+------------+
| Store1 | 4 |
| Store7 | 2 |
| Store8 | 1 |
| Store9 | 1 |
| Store2 | 1 |
| Store3 | 1 |
| Store4 | 1 |
| Store5 | 0 |
| Store6 | 0 |
| Store10 | 0 |
+-----------+------------+
How would I go about displaying the TOP 3 rows BUT Having the remaining rows roll up into a row called "other", and it adds all of their Complaints together?
So like this for example:
+-----------+------------+
| STORENAME | COMPLAINTS |
+-----------+------------+
| Store1 | 4 |
| Store7 | 2 |
| Store8 | 1 |
| Other | 4 |
+-----------+------------+
So what has happened above, is it displays the top3 then adds the complaints of the remaining rows into a row called other
I have exhausted all my resources and cannot find a solution. Please let me know if this makes sense.
I have created a SQLfiddle of the above tables that you can edit if it is possible :)
Here's hoping this is possible :)
Thanks,
Mike

Something like this may work
select *, row_number() over (order by complaints desc) as sno
into #temp
from
(
SELECT
a.StoreName
,COUNT(b.StoreID) AS [Complaints]
FROM Stores a
LEFT JOIN
(
SELECT
StoreName
,Complaint
,StoreID
FROM Complaints
WHERE Complaint = 'yes') b on b.StoreID = a.StoreID
GROUP BY a.StoreName
) as t ORDER BY [Complaints] DESC
select storename,complaints from #temp where sno<4
union all
select 'other',sum(complaints) as complaints from #temp where sno>=4

I do this with double aggregation and row_number():
select (case when seqnum <= 3 then storename else 'Other' end) as StoreName,
sum(numcomplaints) as numcomplaints
from (select c.storename, count(*) as numcomplaints,
row_number() over (order by count(*) desc) as seqnum
from complaints c
where c.complaint = 'Yes'
group by c.storename
) s
group by (case when seqnum <= 3 then storename else 'Other' end) ;
From what I can see, you don't really need any additional information from stores, so this version just leaves that table out.

Related

Get some values from the table by selecting

I have a table:
| id | Number |Address
| -----| ------------|-----------
| 1 | 0 | NULL
| 1 | 1 | NULL
| 1 | 2 | 50
| 1 | 3 | NULL
| 2 | 0 | 10
| 3 | 1 | 30
| 3 | 2 | 20
| 3 | 3 | 20
| 4 | 0 | 75
| 4 | 1 | 22
| 4 | 2 | 30
| 5 | 0 | NULL
I need to get: the NUMBER of the last ADDRESS change for each ID.
I wrote this select:
select dh.id, dh.number from table dh where dh =
(select max(min(t.history)) from table t where t.id = dh.id group by t.address)
But this select not correctly handling the case when the address first changed, and then changed to the previous value. For example id=1: group by return:
| Number |
| -------- |
| NULL |
| 50 |
I have been thinking about this select for several days, and I will be happy to receive any help.
You can do this using row_number() -- twice:
select t.id, min(number)
from (select t.*,
row_number() over (partition by id order by number desc) as seqnum1,
row_number() over (partition by id, address order by number desc) as seqnum2
from t
) t
where seqnum1 = seqnum2
group by id;
What this does is enumerate the rows by number in descending order:
Once per id.
Once per id and address.
These values are the same only when the value is 1, which is the most recent address in the data. Then aggregation pulls back the earliest row in this group.
I answered my question myself, if anyone needs it, my solution:
select * from table dh1 where dh1.number = (
select max(x.number)
from (
select
dh2.id, dh2.number, dh2.address, lag(dh2.address) over(order by dh2.number asc) as prev
from table dh2 where dh1.id=dh2.id
) x
where NVL(x.address, 0) <> NVL(x.prev, 0)
);

Each rows to column values

I'm trying to create a view that shows first table's columns plus second table's first 3 records sorted by date in 1 row.
I tried to select specific rows using offset from sub table and join to main table, but when joining query result is ordered by date, without
WHERE tblMain_id = ..
clause in joining SQL it returns wrong record.
Here is sqlfiddle example: sqlfiddle demo
tblMain
| id | fname | lname | salary |
+----+-------+-------+--------+
| 1 | John | Doe | 1000 |
| 2 | Bob | Ross | 5000 |
| 3 | Carl | Sagan | 2000 |
| 4 | Daryl | Dixon | 3000 |
tblSub
| id | email | emaildate | tblmain_id |
+----+-----------------+------------+------------+
| 1 | John#Doe1.com | 2019-01-01 | 1 |
| 2 | John#Doe2.com | 2019-01-02 | 1 |
| 3 | John#Doe3.com | 2019-01-03 | 1 |
| 4 | Bob#Ross1.com | 2019-02-01 | 2 |
| 5 | Bob#Ross2.com | 2018-12-01 | 2 |
| 6 | Carl#Sagan.com | 2019-10-01 | 3 |
| 7 | Daryl#Dixon.com | 2019-11-01 | 4 |
View I am trying to achieve:
| id | fname | lname | salary | email_1 | emaildate_1 | email_2 | emaildate_2 | email_3 | emaildate_3 |
+----+-------+-------+--------+---------------+-------------+---------------+-------------+---------------+-------------+
| 1 | John | Doe | 1000 | John#Doe1.com | 2019-01-01 | John#Doe2.com | 2019-01-02 | John#Doe3.com | 2019-01-03 |
View I have created
| id | fname | lname | salary | email_1 | emaildate_1 | email_2 | emaildate_2 | email_3 | emaildate_3 |
+----+-------+-------+--------+---------+-------------+---------------+-------------+---------------+-------------+
| 1 | John | Doe | 1000 | (null) | (null) | John#Doe1.com | 2019-01-01 | John#Doe2.com | 2019-01-02 |
You can use conditional aggregation:
select m.id, m.fname, m.lname, m.salary,
max(s.email) filter (where seqnum = 1) as email_1,
max(s.emailDate) filter (where seqnum = 1) as emailDate_1,
max(s.email) filter (where seqnum = 2) as email_2,
max(s.emailDate) filter (where seqnum = 3) as emailDate_2,
max(s.email) filter (where seqnum = 3) as email_3,
max(s.emailDate) filter (where seqnum = 3) as emailDate_3
from tblMain m left join
(select s.*,
row_number() over (partition by tblMain_id order by emailDate desc) as seqnum
from tblsub s
) s
on s.tblMain_id = m.id
where m.id = 1
group by m.id, m.fname, m.lname, m.salary;
Here is a SQL Fiddle.
Here is a solution that should get you what you expect.
This works by first ranking records within each table and joining them together. Then, the outer query uses aggregation to generate the expected output.
This solution will work even if the first record in the main table does not have id 1. Also filtering takes occurs within the JOINs, so this should be quite efficient.
SELECT
m.id,
m.fname,
m.lname,
m.salary,
MAX(CASE WHEN s.rn = 1 THEN s.email END) email_1,
MAX(CASE WHEN s.rn = 1 THEN s.emaildate END) email_date1,
MAX(CASE WHEN s.rn = 2 THEN s.email END) email_2,
MAX(CASE WHEN s.rn = 2 THEN s.emaildate END) email_date2,
MAX(CASE WHEN s.rn = 3 THEN s.email END) email_3,
MAX(CASE WHEN s.rn = 3 THEN s.emaildate END) email_date3
FROM
(
SELECT m.*, ROW_NUMBER() OVER(ORDER BY id) rn
FROM tblMain
) m
INNER JOIN (
SELECT
email,
emaildate,
ROW_NUMBER() OVER(PARTITION BY id ORDER BY emaildate) rn
FROM tblSub
) s
ON m.id = s.tblmain_id
AND m.rn = 1
AND s.rn <= 3
GROUP BY
m.id,
m.fname,
m.lname,
m.salary

get the value from the previous row if row is NULL

I have this pivoted table
+---------+----------+----------+-----+----------+
| Date | Product1 | Product2 | ... | ProductN |
+---------+----------+----------+-----+----------+
| 7/1/15 | 5 | 2 | ... | 7 |
| 8/1/15 | 7 | 1 | ... | 9 |
| 9/1/15 | NULL | 7 | ... | NULL |
| 10/1/15 | 8 | NULL | ... | NULL |
| 11/1/15 | NULL | NULL | ... | NULL |
+---------+----------+----------+-----+----------+
I wanted to fill in the NULL column with the values above them. So, the output should be something like this.
+---------+----------+----------+-----+----------+
| Date | Product1 | Product2 | ... | ProductN |
+---------+----------+----------+-----+----------+
| 7/1/15 | 5 | 2 | ... | 7 |
| 8/1/15 | 7 | 1 | ... | 9 |
| 9/1/15 | 7 | 7 | ... | 9 |
| 10/1/15 | 8 | 7 | ... | 9 |
| 11/1/15 | 8 | 7 | ... | 9 |
+---------+----------+----------+-----+----------+
I've found this article that might help me but this only manipulate one column. How do I apply this to all my column or how can I achieve such result since my columns are dynamic.
Any help would be much appreciated. Thanks!
The ANSI standard has the IGNORE NULLS option on LAG(). This is exactly what you want. Alas, SQL Server has not (yet?) implemented this feature.
So, you can do this in several ways. One is using multiple outer applys. Another uses correlated subqueries:
select p.date,
(case when p.product1 is not null else p.product1
else (select top 1 p2.product1 from pivoted p2 where p2.date < p.date order by p2.date desc)
end) as product1,
(case when p.product1 is not null else p.product1
else (select top 1 p2.product1 from pivoted p2 where p2.date < p.date order by p2.date desc)
end) as product1,
(case when p.product2 is not null else p.product2
else (select top 1 p2.product2 from pivoted p2 where p2.date < p.date order by p2.date desc)
end) as product2,
. . .
from pivoted p ;
I would recommend an index on date for this query.
I would like to suggest you a solution. If you have a table which consists of merely two columns my solution will work perfectly.
+---------+----------+
| Date | Product |
+---------+----------+
| 7/1/15 | 5 |
| 8/1/15 | 7 |
| 9/1/15 | NULL |
| 10/1/15 | 8 |
| 11/1/15 | NULL |
+---------+----------+
select x.[Date],
case
when x.[Product] is null
then min(c.[Product])
else
x.[Product]
end as Product
from
(
-- this subquery evaluates a minimum distance to the rows where Product column contains a value
select [Date],
[Product],
min(case when delta >= 0 then delta else null end) delta_min,
max(case when delta < 0 then delta else null end) delta_max
from
(
-- this subquery maps Product table to itself and evaluates the difference between the dates
select p.[Date],
p.[Product],
DATEDIFF(dd, p.[Date], pnn.[Date]) delta
from #products p
cross join (select * from #products where [Product] is not null) pnn
) x
group by [Date], [Product]
) x
left join #products c on x.[Date] =
case
when abs(delta_min) < abs(delta_max) then DATEADD(dd, -delta_min, c.[Date])
else DATEADD(dd, -delta_max, c.[Date])
end
group by x.[Date], x.[Product]
order by x.[Date]
In this query I mapped the table to itself rows which contain values by CROSS JOIN statement. Then I calculated differences between dates in order to pick the closest ones and thereafter fill empty cells with values.
Result:
+---------+----------+
| Date | Product |
+---------+----------+
| 7/1/15 | 5 |
| 8/1/15 | 7 |
| 9/1/15 | 7 |
| 10/1/15 | 8 |
| 11/1/15 | 8 |
+---------+----------+
Actually, the suggested query doesn't choose the previous value. Instead of this, it selects the closest value. In other words, my code can be used for a number of different purposes.
First You need to add identity column in temporary or hard table then resolved by following method.
--- Solution ----
Create Table #Test (ID Int Identity (1,1),[Date] Date , Product_1 INT )
Insert Into #Test ([Date], Product_1)
Values
('7/1/15',5)
,('8/1/15',7)
,('9/1/15',Null)
,('10/1/15',8)
,('11/1/15',Null)
Select ID , DATE ,
IIF ( Product_1 is null ,
(Select Product_1 from #TEST
Where ID = (Select Top 1 a.ID From #TEST a where a.Product_1 is not null and a.ID<b.ID
Order By a.ID desc)
),Product_1) Product_1
from #Test b
-- Solution End ---

Inconsistent Transpose

Given a table A has the following data:
+----------+-------+
| Supplier | buyer |
+----------+-------+
| A | 1 |
| A | 2 |
| B | 3 |
| B | 4 |
| B | 5 |
+----------+-------+
My question is, can I transpose the second column so the resultant table will be like:
+----------+--------+--------+--------+
| Supplier | buyer1 | buyer2 | buyer3 |
+----------+--------+--------+--------+
| A | 1 | 2 | |
| B | 3 | 4 | 5 |
+----------+--------+--------+--------+
Assuming the maximum number of buyers is known as three.
You could use a common table expression to give each buyer an order within the supplier, and then just do a regular case to put them in columns;
WITH cte AS (
SELECT supplier, buyer,
ROW_NUMBER() OVER (PARTITION BY supplier ORDER BY buyer) rn
FROM Table1
)
SELECT supplier,
MAX(CASE WHEN rn=1 THEN buyer END) buyer1,
MAX(CASE WHEN rn=2 THEN buyer END) buyer2,
MAX(CASE WHEN rn=3 THEN buyer END) buyer3
FROM cte
GROUP BY supplier;
An SQLfiddle to test with.
You may consider using PIVOT clause:
select *
from (
select supplier, buyer, row_number() over (partition by supplier order by buyer) as seq
from a
)
pivot (max(buyer) for seq in (1 as buyer1, 2 as buyer2, 3 as buyer3));
SQLFiddle here.

if more than 1 match, do not return 'unknown'

I composed a monster query. I'm certain that it can be optimized, and I would more than appreciate any comments/guidance on the query itself; however, I have a specific question:
The data I am returning is sometimes duplicated on multiple columns:
+-------+------+----------+------+-------+--------+----------+-------+------+
| first | last | deaID | cert | count | npi | clientid | month | year |
+-------+------+----------+------+-------+--------+----------+-------+------+
| Alex | Jue | UNKNOWN | MD | 11 | 123123 | 102889 | 7 | 2012 |
| Alex | Jue | BJ123123 | MD | 11 | 123123 | 102889 | 7 | 2012 |
+-------+------+----------+------+-------+--------+----------+-------+------+
as you can see all of the fields are equal except for deaID
in this case, I would like to only return:
+------+-----+----------+----+----+--------+--------+---+------+
| | | | | | | | | |
+------+-----+----------+----+----+--------+--------+---+------+
| Alex | Jue | BJ123123 | MD | 11 | 123123 | 102889 | 7 | 2012 |
+------+-----+----------+----+----+--------+--------+---+------+
however, if there are no duplicates:
+-------+------+---------+------+-------+--------+----------+-------+------+
| first | last | deaID | cert | count | npi | clientid | month | year |
+-------+------+---------+------+-------+--------+----------+-------+------+
| Alex | Jue | UNKNOWN | MD | 11 | 123123 | 102889 | 7 | 2012 |
+-------+------+---------+------+-------+--------+----------+-------+------+
then i would like to keep it!
summary
if there are duplicates remove all records with 'deaID=unknown'; however, if there is only 1 match then return that match
question
how do i return unknown records IFF there is 1 match?
here is the monster query in case anybody is interested :)
with ctebiggie as (
select distinct
p.[IMS_PRESCRIBER_ID],
p.PHYSICIAN_NPI as MLISNPI,
a.CLIENT_ID,
p.MLIS_FIRSTNAME,
p.MLIS_LASTNAME,
p_address.IMS_DEA_NBR,
p.IMS_PROFESSIONAL_ID_NBR,
p.IMS_PROFESSIONAL_ID_NBR_src,
p.IMS_CERTIFICATION_CODE,
datepart(mm,a.RECEIVED_DATE) as [Month],
datepart(yyyy,a.RECEIVED_DATE) as [Year]
from
MILLENNIUM_DW_dev..D_PHYSICIAN p
left outer join
MILLENNIUM_DW_dev..F_ACCESSION_DAILY a
on a.REQUESTOR_NPI=p.PHYSICIAN_NPI
left outer join MILLENNIUM_DW_dev..D_PHYSICIAN_ADDRESS p_address
on p.PHYSICIAN_NPI=p_address.PHYSICIAN_NPI
where
a.RECEIVED_DATE is not null
--and p.IMS_PRESCRIBER_ID is not null
--and p_address.IMS_DEA_NBR !='UNKNOWN'
and p.REC_ACTIVE_FLG=1
and p_address.REC_ACTIVE_FLG=1
and DATEPART(yyyy,received_date)=2012
and DATEPART(mm,received_date)=7
group by
p.[IMS_PRESCRIBER_ID],
p.PHYSICIAN_NPI,
p.IMS_PROFESSIONAL_ID_NBR,
p.MLIS_FIRSTNAME,
p.MLIS_LASTNAME,
p_address.IMS_DEA_NBR,
p.IMS_PROFESSIONAL_ID_NBR,
p.IMS_PROFESSIONAL_ID_NBR_src,
p.IMS_CERTIFICATION_CODE,
datepart(mm,a.RECEIVED_DATE),
datepart(yyyy,a.RECEIVED_DATE),
a.CLIENT_ID
)
,
ctecount as
(select
COUNT (Distinct f.ACCESSION_ID) [count],
f.REQUESTOR_NPI,f.CLIENT_ID,
datepart(mm,f.RECEIVED_DATE) mm,
datepart(yyyy,f.RECEIVED_DATE)yyyy
from MILLENNIUM_DW_dev..F_ACCESSION_DAILY f
where
f.CLIENT_ID not in (select * from SalesDWH..TestPractices)
and DATEPART(yyyy,f.received_date)=2012
and DATEPART(mm,f.received_date)=7
group by f.REQUESTOR_NPI,
f.CLIENT_ID,
datepart(mm,f.RECEIVED_DATE),
datepart(yyyy,f.RECEIVED_DATE)
)
select ctebiggie.*,c.* from
ctebiggie
full outer join
ctecount c
on c.REQUESTOR_NPI=ctebiggie.MLISNPI
and c.mm=ctebiggie.[Month]
and c.yyyy=ctebiggie.[Year]
and c.CLIENT_ID=ctebiggie.CLIENT_ID
Assuming you have the base query, I will assign row_number and count by partition function over this resultset. Then on the outer select, if count is 1 then unknown is selected, else it is not selected.
SELECT first,
last,
deaID,
cert,
count,
npi,
clientid,
month,
year
FROM (
SELECT first,
last,
deaID,
cert,
count,
npi,
clientid,
month,
year,
ROW_NUMBER() OVER (PARTITION BY
first,last,cert,count,npi,clientid,month,year
ORDER BY CASE WHEN deaID = 'Unkown' THEN 0 ELSE 1 END,
deaID) AS RowNumberInGroup,
COUNT() OVER (PARTITION BY first,last,cert,count,npi,clientid,month,year)
AS CountPerGroup,
SUM(CASE WHEN deaID = 'Unkown' THEN 1 ELSE 0 END)
OVER (PARTITION BY first,last,cert,count,npi,clientid,month,year)
AS UnknownCountPerGroup
FROM BaseQuery
) T
WHERE (T.CountPerGroup = T.UnknownCountPerGroup AND T.RowNumberInGroup = 1) OR T.RowNumberInGroup > T.UnknownCountPerGroup
see this helps or not
select distinct main.col1,main.col2 ,
isnull(( select col3 from table1 where table1.col1=main.col1
and table1.col2=main.col2 and col3 <>'UNKNOWN'),'UNKNOWN')
from table1 main
Sample in Sql fiddle
or fair version of yours will be
SELECT distinct first,
last,
cert,
count,
npi,
clientid,
month,
year,
isnull(
select top 1 dealid from table1 intable where
intable.first=maintable.first and
intable.last=maintable.last and
intable.cert=maintable.cert and
intable.npi=maintable.npi and
intable.clientid=outtable.clientid and
intable.month=outtable.month and
intable.year=outtable.year
where dealid<>'UNKNOWN'),'UNKNOWN') as dealId
FROM table1 maintable