Get DISTINCT COUNT in one pass in SQL Server - sql

I have a table like below:
Region Country Manufacturer Brand Period Spend
R1 C1 M1 B1 2016 5
R1 C1 M1 B1 2017 10
R1 C1 M1 B1 2017 20
R1 C1 M1 B2 2016 15
R1 C1 M1 B3 2017 20
R1 C2 M1 B1 2017 5
R1 C2 M2 B4 2017 25
R1 C2 M2 B5 2017 30
R2 C3 M1 B1 2017 35
R2 C3 M2 B4 2017 40
R2 C3 M2 B5 2017 45
...
I wrote the query below to aggregate them:
SELECT [Region]
,[Country]
,[Manufacturer]
,[Brand]
,Period
,SUM([Spend]) AS [Spend]
FROM myTable
GROUP BY [Region]
,[Country]
,[Manufacturer]
,[Brand]
,[Period]
ORDER BY 1,2,3,4
which yields something like below:
Region Country Manufacturer Brand Period Spend
R1 C1 M1 B1 2016 5
R1 C1 M1 B1 2017 30 -- this row is an aggregate from raw table above
R1 C1 M1 B2 2016 15
R1 C1 M1 B3 2017 20
R1 C2 M1 B1 2017 4 -- aggregated result
R1 C2 M2 B4 2017 25
R1 C2 M2 B5 2017 30
R2 C3 M2 B4 2017 40
R2 C3 M2 B5 2017 45
I'd like to add another column to the above table that shows the DISTINCT COUNT of Brand grouped by Region,Country,Manufacturer and Period. So the final table would become as follow:
Region Country Manufacturer Brand Period Spend UniqBrandCount
R1 C1 M1 B1 2016 5 2 -- two brands by R1, C1, M1 in 2016
R1 C1 M1 B1 2017 30 1
R1 C1 M1 B2 2016 15 2 -- same as first row's result
R1 C1 M1 B3 2017 20 1
R1 C2 M1 B1 2017 4 1
R1 C2 M2 B4 2017 25 2
R1 C2 M2 B5 2017 30 2
R2 C3 M2 B4 2017 40 2
R2 C3 M2 B5 2017 45 2
I know how to get to final result in three steps.
Run this query (Query #1):
SELECT [Region]
,[Country]
,[Manufacturer]
,[Period]
,COUNT(DISTINCT [Brand]) AS [BrandCount]
INTO Temp1
FROM myTable
GROUP BY [Region]
,[Country]
,[Manufacturer]
,[Period]
Run this query (Query #2)
SELECT [Region]
,[Country]
,[Manufacturer]
,[Brand]
,YEAR([Period]) AS Period
,SUM([Spend]) AS [Spend]
INTO Temp2
FROM myTable
GROUP BY [Region]
,[Country]
,[Manufacturer]
,[Brand]
,[Period]
Then LEFT JOIN Temp2 and Temp1 to bring in [BrandCount] from the latter like below:
SELECT a.*
,b.*
FROM Temp2 AS a
LEFT JOIN Temp1 AS b ON a.[Region] = b.[Region]
AND a.[Country] = b.[Country]
AND a.[Advertiser] = b.[Advertiser]
AND a.[Period] = b.[Period]
I'm pretty sure there is a more efficient way to do this, is there? Thank you in advance for your suggestions/answers!

Borrowing heavily from this question: https://dba.stackexchange.com/questions/89031/using-distinct-in-window-function-with-over
Count Distinct doesn't work, so dense_rank is required. Ranking the brands in forward and then reverse order, and then subtracting 1 gives the distinct count.
Your sum function can also be rewritten using PARTITION BY logic. This way you can use different grouping levels for each aggregation:
SELECT
[Region]
,[Country]
,[Manufacturer]
,[Brand]
,[Period]
,dense_rank() OVER
(PARTITION BY
[Region]
,[Country]
,[Manufacturer]
,[Period] Order by Brand)
+ dense_rank() OVER
(PARTITION BY
[Region]
,[Country]
,[Manufacturer]
,[Period] Order by Brand Desc)
- 1
AS [BrandCount]
,SUM([Spend]) OVER
(PARTITION BY
[Region]
,[Country]
,[Manufacturer]
,[Brand]
,[Period]) as [Spend]
from
myTable
ORDER BY 1,2,3,4
You may then need to reduce the number of rows in your output, as this syntax gives the same number of rows as myTable, but with the aggregation totals appearing on each row they apply to:
R1 C1 M1 B1 2016 2 5
R1 C1 M1 B1 2017 2 30 --dup1
R1 C1 M1 B1 2017 2 30 --dup1
R1 C1 M1 B2 2016 2 15
R1 C1 M1 B3 2017 2 20
R1 C2 M1 B1 2017 1 5
R1 C2 M2 B4 2017 2 25
R1 C2 M2 B5 2017 2 30
R2 C3 M1 B1 2017 1 35
R2 C3 M2 B4 2017 2 40
R2 C3 M2 B5 2017 2 45
Selecting distinct rows from this output gives you what you need.
How the dense_rank trick works
Consider this data:
Col1 Col2
B 1
B 1
B 3
B 5
B 7
B 9
dense_rank() ranks data according to the number of distinct items before the current one, plus 1. So:
1->1, 3->2, 5->3, 7->4, 9->5.
In reverse order (using desc) this yields the reverse pattern:
1->5, 3->4, 5->3, 7->2, 9->1:
Adding these ranks together gives the same value:
1+5 = 2+4 = 3+3 = 4+2 = 5+1 = 6
The wording is helpful here,
(number of distinct items before + 1) + (number of distinct items after + 1)
= number of distinct OTHER items before AND after + 2
= Total number of distinct items + 1
So to get the total number of distinct items, add the ascending and descending dense_ranks together and subtract 1.

The tag to your question;
window-functions
suggests you have a pretty good idea.
For DISTINCT COUNT of Brand grouped by Region,Country,Manufacturer and Period: you may write:
Select Region
,Country
,Manufacturer
,Brand
,Period
,Spend
,DENSE_RANK() Over (Partition By Region, Country, Manufacturer, Period Order By Brand asc)
+ DENSE_RANK() Over (Partition By Region, Country, Manufacturer, Period Order By Brand desc)
-1 UniqBrandCount
From myTable T1
Order By 1,2,3,4

The double dense_rank idea means that you need two sorts (assuming no index exists that provides sort order). Assuming no NULL brands (as that idea does) you can use a single dense_rank and a windowed MAX as below (demo)
WITH T1
AS (SELECT *,
DENSE_RANK() OVER (PARTITION BY [Region], [Country], [Manufacturer], [Period] ORDER BY Brand) AS [dr]
FROM myTable),
T2
AS (SELECT *,
MAX([dr]) OVER (PARTITION BY [Region], [Country], [Manufacturer], [Period]) AS UniqBrandCount
FROM T1)
SELECT [Region],
[Country],
[Manufacturer],
[Brand],
Period,
SUM([Spend]) AS [Spend],
MAX(UniqBrandCount) AS UniqBrandCount
FROM T2
GROUP BY [Region],
[Country],
[Manufacturer],
[Brand],
[Period]
ORDER BY [Region],
[Country],
[Manufacturer],
[Period],
Brand
The above has some inevitable spooling (it isn't possible to do this in a 100% streaming manner) but a single sort.
Strangely the final order by clause is needed to keep the number of sorts down to one (or zero if a suitable index exists).

Related

Customers with 75% of orders- SQL

I have a file that contains customers with orders and I need to find the top 75%. It needs to be at least 75% and orders with the same number will be included. Need to figure out the where statement to select the records.
Cust | Orders | Accum Orders | Accum %
c1 10 10 29%
c2 7 17 45%
c3 5 22 63%
c4 4 26 74%
c5 3 29 83%
c6 3 32 89%
c7 2 34 94%
c8 1 35 100%
I would like to only extract c1-c6. C4 is only 74% and it needs to be 75%. c5-c6 are the same number of orders so they both need to be extracted.
Thanks
You can solve this by writing two queries and then make them union:
SELECT * FROM TABLE1
WHERE
TO_NUMBER(REPLACE(ACCUMP,'%','')) < 75
UNION
SELECT * FROM TABLE1
WHERE ORDERS IN
(
SELECT ORDERS FROM TABLE1
GROUP BY ORDERS
HAVING COUNT(*) > 1
);
Use window functions:
select t.*
from (select t.*,
sum(orders) over (order by orders desc range between unbounded preceding and current row) as running_orders,
sum(orders) over (partition by orders) as all_with_this_order,
sum(orders) over () as total_orders
from t
) t
where (running_orders - all_with_this_order) < 0.75 * total_orders;
You need a subquery with a group by Orders
select Cust
from tab
where Orders =
(
select Orders
from tab
where replace(accum,'%','') >= 75
group by Orders
having count(AccumOrders) > 1
);
Rextester Demo

Restricting a report to not show clients that closed a product but still have another one opened?

I need to make a report that will return active and inactive clients during a certain period of time.
This is based on some products that have an opening and a closing date. Clients that closed a product, but still had another one open don't need to show up. My problem is that I don't know how to restrict the report in order to make this happen. I have tried to add a not exists clause like:
WITH my_with_as AS
(
SELECT p.product_id,
p.client_id,
p.opening_date,
p.ending_date
FROM products p
WHERE p.opening_date BETWEEN report_start_date AND report_end_date
OR p.ending_date BETWEEN report_open_date AND report_end_date)
SELECT cl.d_start,
CASE
WHEN cl.d_stop BETWEEN report_start_date AND report_end_date THEN my_with_as.ending_date
ELSE NULL
END
FROM (
SELECT
(
SELECT Min(d1.opening_date)
FROM my_with_as d1
WHERE d1.client_id=c.client_id) d_start,
(
SELECT Max(d2.ending_date)
FROM my_with_as d2
WHERE d2.client_id=c.client_id) d_stop,
c.*
FROM clients c) cl,
my_with_as
WHERE cl.client_id=my_with_as.client_id
AND NOT EXISTS
(
SELECT p.product_id
FROM products p
WHERE my_with_as.client_id = p.client_id
AND p.product_id<>my_with_as.product_id
AND Nvl(p.ending_date,report_end_date+1)>report_end_date
AND p.start_date < my_with_as.ending_date
where my_with_as is a with as query with all the products that opened or closed during the period of time of the report.
Problem is for a reporting period of 01.05.2014 - 04.04.2015 and a client that has:
-product_1: opened on 04.04.2001, closed on 25.07.2014
-product_2: opened on 04.04.2010, closed on 25.03.2015
-product_3: opened on 01.01.2015, closed on 04.04.2015
my report shows both the 1st product and the 3rd one even though it shouldn't show anything. Is there a way to verify if the intervals of the products overlap? Any hint or help is highly appreciated as this is has been driving me nuts for 3 days now.
---- EDIT (copied table definitions from comments to the question ) -----
the products table: product_id, client_id, opening_date, ending_date.
the clients table has just the client_id - i simplified it for test purposes.
the report will have 2 dates as parameters: start_report, end_report –
I believe that this is query you are looking for, or at least it will be useful for you:
with report as (select date '2014-05-01' r1, date '2015-04-04' r2 from dual),
mwa AS (
SELECT p.product_id, p.client_id, p.opening_date, p.ending_date, r1, r2
FROM products p cross join report r
WHERE p.opening_date BETWEEN r1 AND r2 OR p.ending_date BETWEEN r1 AND r2 )
SELECT cl.d_start, CASE WHEN cl.d_stop BETWEEN r1 AND r2 THEN mwa.ending_date END d_stop
FROM
(SELECT
(SELECT MIN(opening_date) FROM mwa WHERE client_id = c.client_id) d_start,
(SELECT MAX(ending_date) FROM mwa WHERE client_id = c.client_id) d_stop, c.*
FROM clients c) cl join mwa on mwa.client_id = cl.client_id
WHERE NOT EXISTS (
SELECT p.product_id from products p
WHERE mwa.client_id = p.client_id AND p.product_id<>mwa.product_id
and not (p.opening_date BETWEEN r1 AND r2
OR p.ending_date between r1 and r2))
It returns no rows for given examples, because one row does not belong to analyzed periods - so we don't want rest of rows for this client.
Please replace dates in first line to change period, but use notation 'yyyy-mm-dd', or use to_date() there.
SQLFiddle
In SQLFiddle I added one row for other client to test solution, so we have one row returned.
I suspect that this query may be simplified, but I didn't want too interfere in your code too.
Edit - according to comments:
with report as (select date '2015-02-01' r1, date '2015-04-03' r2 from dual),
p1 as (select product_id pid, client_id cid, opening_date d1, ending_date d2, r1, r2
from products p cross join report r ),
p2 as (select pid, cid, d1, d2,
case when ( d1 < r2 and (d2 is null or d2 > r1) ) then 1 else 0 end overlap,
case when ( d1 < r2 and d2 is not null and d2 < r2 ) then 1 else 0 end closed
from p1)
select * from p2 order by cid, pid
PID CID D1 D2 OVERLAP CLOSED
---------- ---------- ---------- ---------- ---------- ----------
product_1 1 2014-04-04 2014-07-25 0 1
product_2 1 2015-01-04 2015-03-25 1 1
product_3 1 2015-01-01 2015-04-01 1 1
product_1 2 2015-01-01 2015-04-04 1 0
product_1 3 2015-04-04 2015-07-25 0 0
product_2 3 2015-01-01 2015-04-04 1 0
overlap = 1 means that product was "active" in report period, probably these are the
only rows which interests you
closed = 1 means that product has closed status at the end of period
The query above gives us extended information about each product. Now you can filter only rows
with overlap = 1 and work with them. Example of what you can do next is:
with report as (select date '2015-02-01' r1, date '2015-07-20' r2 from dual),
p1 as (select product_id pid, client_id cid, opening_date d1, ending_date d2, r1, r2
from products p cross join report r ),
p as (select pid, cid, d1, d2, r2,
case when ( d1 < r2 and (d2 is null or d2 > r1) ) then 1 else 0 end overlap,
case when ( d1 < r2 and d2 is not null and d2 < r2 ) then 1 else 0 end closed
from p1)
select distinct c.name, d1, nvl(d2, r2) d2
from p join clients c on c.client_id = p.cid
where overlap = 1 and not exists (
select 1 from p tmp where overlap = 1 and closed = 1
and cid = p.cid and pid <> p.pid )
NAME D1 D2
---------- ---------- ----------
Jones 2015-01-01 2015-04-04
Smith 2015-01-01 2015-04-04

Sql Getting Top 2 results in each classification

Hi I am new to sql and stuck in a problem.
Below is the sample of my table. This is not the exact table but a sample of what i am trying to achieve
Name Classification Hits
A A1 2
A A2 3
A A3 4
A A4 8
A A5 9
B B1 9
B B2 3
B B3 4
B B4 8
B B5 9
c c1 8
c c2 9
c c3 4
c c4 8
c c5 9
...
And i am looking for the result based on top Hits . For example
Name Classification Hits
A A4 8
A A5 9
B B1 9
B B5 9
c c2 9
c c5 9
i have tried this query
SELECT TOP (2) Name , Classification , Hits
FROM Table4
Group By Name , Classification , Hits
Order By Hits
But i am only getting two values. What i am doing wrong here any suggestions?
You can use a CTE with the Row_Number() function
;WITH CTE AS(
SELECT Name,
Classification,
Hits,
Row_Number() OVER(Partition by name ORDER BY Hits DESC) AS RowNum
FROM Table4
)
SELECT Name,
Classification,
Hits
FROM CTE
WHERE RowNum <= 2
ORDER BY Name, Hits
SQL FIDDLE DEMO
This will also work. Without using ROW_NUMBER().
Select a.* from MyTable as M1
Cross apply
(
Select top 2 * from Mytable m2
where m1.name = m2.name
order by m2.Hits desc
)as a
where a.Classification = m1.Classification
Fiddle Demo
But I don't know about performance.
I'm working from memory so I'm not sure of the syntax, and there's probably a more efficient way to do this, but you'd want to do something like
;with rawdata as (
select Name, Classification, Hits,
Row_number() over (partition by Name order by Hits desc) as x
)
select Name, Classification, Hits from rawdata where x < 3

Generate Report from SQL Query

I have the following table:
Code State Site Date
----------
x1 2 s1 d1
x2 2 s1 d2
x3 2 s1 d3
x1 3 s2 d4
x2 3 s1 d5
x3 3 s1 d6
x4 2 s2 d7
x5 2 s2 d8
x3 2 s1 d9
----------
(here d1<d2....<d7)
My goal is to make a report (new table) for every Code:
For those Code that been in state 2 and 3 while Date in state 2 less than date in state 3 and the Code in the same site
For the above table the result would be:
Code Date
----------
x2 d2
x3 d3
x2 d5
x3 d6
----------
What I tried was:
Select Code,Date,Site from Transactions where State='2' and Site in
(Select Site from Transactions where State='3')
But this query isn't enough because for the given table it returns:
Code Date
----------
x2 d2
x3 d3
x2 d5
x3 d6
x3 d9
----------
Which isn't what I want excatly since here in date d9 hasnt a pairing with state 3 so that d9< of that pair...
Hope all this make sense.
If they do is there a SQL query to achieve my purpose?
For the set you gave this should work, although since you did not mention what happens in terms of multiple dates for a specific state I did not answer that question.
Long but it works
declare #WhatEverYourTableNameIs Table
(
Code varchar(2),
State int,
Site VarChar(2),
DateGotten Date
)
Insert into #WhatEverYourTableNameIs
Values
('x1',2,'s1','2014-1-1'),
('x2',2,'s1','2014-1-2'),
('x3',2,'s1','2014-1-3'),
('x1',3,'s2','2014-1-4'),
('x2',3,'s1','2014-1-5'),
('x3',3,'s1','2014-1-6'),
('x4',2,'s2','2014-1-7'),
('x5',2,'s2','2014-1-8'),
('x3',2,'s1','2014-1-9')
SELECT * into #MyTemp
FROM
(
SELECT Code, [State],Site [Site],DateGotten
FROM #WhatEverYourTableNameIs
GROUP BY Code, [State], Site, DateGotten
) a
SELECT *
FROM
(
SELECT DISTINCT a.Code, a.State, a.Site, a.DateGotten
FROM #MyTemp a
JOIN (
SELECT *
FROM #MyTemp
WHERE [State] =3
) b ON a.Code = b.Code and a.Site = b.Site
WHERE a.[State] = 2 and a.DateGotten < b.DateGotten
UNION
SELECT DISTINCT b.Code, b.State, b.Site, b.DateGotten
FROM #MyTemp a
JOIN (
SELECT *
FROM #MyTemp
WHERE [State] =3
) b on a.Code = b.Code and a.Site = b.Site
WHERE b.[State] = 3 and a.DateGotten < b.DateGotten
) a
order by a.DateGotten
drop table #MyTemp
>Code State Site DateGotten
>x2 2 s1 2014-01-02
>x3 2 s1 2014-01-03
>x2 3 s1 2014-01-05
>x3 3 s1 2014-01-06

Split one record into multiple rows

Can somebody help me to split records into multiple rows.
My records look like this
321517 2013 SEPTEMBER 3 30 286787 321517-2013
321517 2013 SEPTEMBER 2 42 286787 321517-2013
I want them to look like this
321517 2013 SEPTEMBER 1 30 286787 321517-2013
321517 2013 SEPTEMBER 1 30 286787 321517-2013
321517 2013 SEPTEMBER 1 30 286787 321517-2013
321517 2013 SEPTEMBER 1 42 286787 321517-2013
321517 2013 SEPTEMBER 1 42 286787 321517-2013
You can get the max possible value and then make a recursive CTE to generate rows.
;WITH MAX_VALUE AS (
SELECT MAX(C4) AS VAL FROM Table1
),
TMP_ROWS AS (
SELECT 1 AS PARENT, 0 AS LVL, 1 AS ID
UNION ALL
SELECT
CHILD.PARENT,
TMP_ROWS.LVL + 1 AS LVL,
TMP_ROWS.ID
FROM (SELECT 1 AS PARENT, 1 AS ID, 0 AS NIVEL) AS CHILD
INNER JOIN TMP_ROWS ON CHILD.PARENT = TMP_ROWS.ID
WHERE TMP_ROWS.LVL < (SELECT VAL FROM MAX_VALUE)
)
select C1, C2, C3, 1 C4, C5, C6, C7
from Table1 join TMP_ROWS on C4 > TMP_ROWS.LVL
order by C1, C2, C3, C5, C6, C7
Demo (based on previuos reply data)
*Edit: "ROWS" isnt a good name for a table
You can try something like this. Please note that this query assumes maximum value of your 4th Column is 10. You can add more rows to the CTE using a cross join if you have higher values.
;WITH CTE AS (
select Digit
from ( values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) AS t(Digit)
)
select C1, C2, C3, 1 C4, C5, C6, C7
from Table1 join CTE on C4 > Digit
order by C1
Fiddle demo on sql server 2008