SELECT TOP N with distinct/unique field values

SELECT TOP N with distinct/unique field values - sql

Based on the answer to a previous question of mine. I end up with a result set something like:
PartyName Risk SubTotal Total
A High 100 280
B Med 25 45
A Low 30 280
A Med 70 280
B Low 10 45
C High 110 170
C Med 60 170
D Low 30 30
A Med 80 280
B Low 10 45
What I need is to SELECT TOP N unique PartyName with highest Amounts, i.e if N = 2 the result should be:
PartyName Risk SubTotal Total
A High 100 280
A Low 30 280
A Med 70 280
C High 110 170
C Med 60 170
A Med 80 280
all entries with the highest N Total values.
Tried this:
SELECT DISTINCT TOP(10) s.PartyName, s.Risk, s.SubTotal, s2.Total
FROM
(SELECT PartyName, Risk, SUM(CAST(Amount AS DECIMAL)) SubTotal
FROM CustomerData
GROUP BY PartyName, Risk) S
LEFT JOIN
(SELECT PartyName, SUM(CAST(Amount AS DECIMAL)) Total
FROM CustomerData
GROUP BY PartyName) S2
ON S.PartyName = S2.Partyname
But doesn't work

Off the top off my head, maybe something like this:
if object_id('tempdb.dbo.#test') is not null drop table #test
create table #test
(
partyname varchar(50),
Risk varchar(50),
amount int
)
insert into #test
select 'A','High',50
union all select 'B','Med', 15
union all select 'A','Low', 12
union all select 'A','Med' , 43
union all select 'B','Low' , 65
union all select 'C','High', 12
union all select 'C','Med' , 789
union all select 'D','Low' , 12
union all select 'A' ,'Med', 34
union all select 'B' ,'Low', 43
SELECT
main.PartyName,
main.Risk,
main.SubTotal,
TotalValues.Total
FROM
--get party+risk+subtotal
(
SELECT PartyName, Risk, SUM(CAST(Amount AS DECIMAL)) SubTotal
FROM #test
GROUP BY PartyName, Risk
) main
--get total by partyname with a rownum to get top N, where N=2
INNER JOIN
(SELECT
b.partyName, b.Total, row_number() over (order by Total desc) as rid
FROM
(
SELECT b.PartyName, SUM(CAST(Amount AS DECIMAL)) as Total
FROM #test b
group by b.PartyName
) as b
) as TotalValues
on TotalValues.partyName = main.partyName
and TotalValues.rid <= 2 --n = 2
order by
main.partyname,
TotalValues.Total

1st we get a set of data with the totals, next we find the range of totals of interest, finally we get teh results...
Untested:
WITH mAgg AS (SELECT partyName
, Risk
, sum(cast(amount as decimal(10,2)) over (partition by partyName, Risk) as subTotal
, sum(cast(amount as decimal(10,2)) over (partition by partyName) as Total
FROM CustomerData),
mRange as (SELECT distinct top 2 total from mAgg order by total desc)
SELECT * FROM mAgg where Total >= (SELECT min(total)
FROM mRange))
Or maybe we could just dense_rank() over (partition by total desc) and then get the anything with rank >=2 or N...

I think this version should do what you want:
SELECT TOP (10) WITH TIES PartyName, Risk,
SUM(CAST(Amount AS DECIMAL)) as SubTotal,
MAX(SUM(CAST(Amount AS DECIMAL))) OVER (PARTITION BY PartyName) as Total
FROM CustomerData
GROUP BY PartyName, Risk
ORDER BY Total DESC, PartyName;
EDIT:
The above gives all rows tied with the 10th. If you want all rows with the 10 distinct values, then let's use DENSE_RANK():
SELECT cd.*
FROM (SELECT cd.*, DENSE_RANK() OVER (ORDER BY Total DESC) as seqnum
FROM (SELECT TOP (10) WITH TIES PartyName, Risk,
SUM(CAST(Amount AS DECIMAL)) as SubTotal,
MAX(SUM(CAST(Amount AS DECIMAL))) OVER (PARTITION BY PartyName) as Total
FROM CustomerData
GROUP BY PartyName, Risk
) cd
) cd
WHERE seqnum <= 10
ORDER BY Total DESC, PartyName;

Related

How to do a Min and Max of date but following the changes in price points

I'm not really sure how to word this question better so I'll provide the data that I have and the result that I'm after.
This is the data that I have
sku sales qty date
A 100 1 1-Jan-19
A 200 2 2-Jan-19
A 100 1 3-Jan-19
A 240 2 4-Jan-19
A 360 3 5-Jan-19
A 360 4 6-Jan-19
A 200 2 7-Jan-19
A 90 1 8-Jan-19
B 100 1 9-Jan-19
B 200 2 10-Jan-19
And this is the result that I'm after
sku price sum(qty) sum(sales) min(date) max(date)
A 100 4 400 1-Jan-19 3-Jan-19
A 120 5 600 4-Jan-19 5-Jan-19
A 90 4 360 6-Jan-19 6-Jan-19
A 100 2 200 7-Jan-19 7-Jan-19
A 90 1 90 8-Jan-19 8-Jan-19
B 100 3 300 9-Jan-19 10-Jan-19
As you can see, I'm trying to get the min and max date of each price point, where price = sales/qty. At this point, I can get the min and max date of the same price but I can separate it when there's another price in between. I think I have to use some sort of min(date) over (partition by sales/qty order by date) but I can't figure it out yet.
I'm using Redshift SQL

This is a gaps-and-islands query. You can do this by generating a sequence and subtracting that from the date. Then aggregate:
select sku, price, sum(qty), sum(sales),
min(date), max(date)
from (select t.*,
row_number() over (partition by sku, price order by date) as seqnum
from t
) t
group by sku, price, (date - seqnum * interval '1 day')
order by sku, price, min(date);

You can do with Sub Query and LAG
FIDDLE DEMO
SELECT SKU, Price, SUM(Qty) SumQty, SUM(Sales) SumSales, MIN(date) MinDate, MAX(date) MaxDate
FROM (
SELECT SKU,Price,SUM(is_change) OVER(order by SKU, date) is_change,Sales, Qty,date
FROM (SELECT SKU, Sales/Qty AS Price, Sales, Qty,date,
CASE WHEN Sales/Qty = lag(Sales/Qty) over (order by SKU, date)
and SKU = lag(SKU) OVER (order by SKU, date) then 0 ELSE 1 END AS is_change
FROM Tbl
)InnerSelect
) X GROUP BY sku, price,is_change
ORDER BY SKU,MIN(date)
Output

calculating percentage of sales profit in SQL

Product Group Product ID Sales Profit
A 6797 1,000 200
A 6745 500 90
B 1278 200 60
B 1245 1,500 350
C 7890 650 80
D 4587 350 50
Q1). Filter out product IDs that contribute to top 80% of the total profit of their respective group.

Not sure what rdbms you are using you can get the output in SQL server in this way. You can get profit for a group and use aggregate function to compare and filter the rows.
select 'A' as Product_group, 6797 as ProductID, 1000 as Sales , 200 as Profit into #temp1 union all
select 'A' as Product_group, 6745 as ProductID, 500 as Sales , 90 as Profit union all
select 'B' as Product_group, 1278 as ProductID, 200 as Sales , 60 as Profit union all
select 'B' as Product_group, 1245 as ProductID, 1500 as Sales , 350 as Profit union all
select 'C' as Product_group, 7890 as ProductID, 650 as Sales , 80 as Profit union all
select 'D' as Product_group, 4587 as ProductID, 350 as Sales , 50 as Profit
select t.Product_group, t.ProductID, sum(t.sales) totalsles, sum(t.profit) totalProfit, sum(Profit_grp) Groupprofit from #temp1 t
join (select Product_group, sum(sales) totalsles_group, sum(profit) Profit_grp from #temp1 t1 group by Product_group) t1 on t1.Product_group = t.Product_group
group by t.Product_group, t.ProductID
having sum(t.profit) *1.0/ sum(t1.Profit_grp) *1.0 >= 0.8
Output: I added group profit just to compare. You can remove the aggregate and add in group by if you would like
Product_group ProductID totalsles totalProfit Groupprofit
B 1245 1500 350 410
C 7890 650 80 80
D 4587 350 50 50

I think this may works out:
with CTE as(
select [Product Group], sum([Sales]) as Tolsum from Table
group by [Product Group]
select prod.*,
sum(prod.[Profit]/cte.[Tolsum]) over (Partition by prod.[Product Group] Order by prod.[Product ID]) as contribution
from CTE cte
inner join
Table prod
on
cte.[Product Group] = prod.[Product Group]
having
sum(prod.[Profit]/cte.[Tolsum]) over (Partition by prod.[Product Group] Order by prod.[Product ID]) < 0.8

T-SQL calculate the percent increase or decrease between the earliest and latest for each project

I have a table like below, I am trying to run a query in T-SQL to get the earliest and latest costs for each project_id according to the date column and calculate the percent cost increase or decrease and return the data-set show in the second table (I have simplified the table in this question).
project_id date cost
-------------------------------
123 7/1/17 5000
123 8/1/17 6000
123 9/1/17 7000
123 10/1/17 8000
123 11/1/17 9000
456 7/1/17 10000
456 8/1/17 9000
456 9/1/17 8000
876 1/1/17 8000
876 6/1/17 5000
876 8/1/17 10000
876 11/1/17 8000
Result:
(Edit: Fixed the result)
project_id "cost incr/decr pct"
------------------------------------------------
123 80% which is (9000-5000)/5000
456 -20%
876 0%
Whatever query I run I get duplicates.
This is what I tried:
select distinct
p1.Proj_ID, p1.date, p2.[cost], p3.cost,
(nullif(p2.cost, 0) / nullif(p1.cost, 0)) * 100 as 'OVER UNDER'
from
[PROJECT] p1
inner join
(select
[Proj_ID], [cost], min([date]) min_date
from
[PROJECT]
group by
[Proj_ID], [cost]) p2 on p1.Proj_ID = p2.Proj_ID
inner join
(select
[Proj_ID], [cost], max([date]) max_date
from
[PROJECT]
group by
[Proj_ID], [cost]) p3 on p1.Proj_ID = p3.Proj_ID
where
p1.date in (p2.min_date, p3.max_date)

Unfortunately, SQL Server does not have a first_value() aggregation function. It does have an analytic function, though. So, you can do:
select distinct project_id,
first_value(cost) over (partition by project_id order by date asc) as first_cost,
first_value(cost) over (partition by project_id order by date desc) as last_cost,
(first_value(cost) over (partition by project_id order by date desc) /
first_value(cost) over (partition by project_id order by date asc)
) - 1 as ratio
from project;
If cost is an integer, you may need to convert to a representation with decimal places.

You can use row_number and OUTER APPLY over top 1 ... prior to SQL 2012
select
min_.projectid,
latest_.cost - min_.cost [Calculation]
from
(select
row_number() over (partition by projectid order by date) rn
,projectid
,cost
from projectable) min_ -- get the first dates per project
outer apply (
select
top 1
cost
from projectable
where
projectid = min_.projectid -- get the latest cost for each project
order by date desc
) latest_
where min_.rn = 1

This might perform a little better
;with costs as (
select *,
ROW_NUMBER() over (PARTITION BY project_id ORDER BY date) mincost,
ROW_NUMBER() over (PARTITION BY project_id ORDER BY date desc) maxcost
from table1
)
select project_id,
min(case when mincost = 1 then cost end) as cost1,
max(case when maxcost = 1 then cost end) as cost2,
(max(case when maxcost = 1 then cost end) - min(case when mincost = 1 then cost end)) * 100 / min(case when mincost = 1 then cost end) as [OVER UNDER]
from costs a
group by project_id

SQL -- Get items responsible for top 50% of sales

I have a table like this:
ITEM_SALES
ITEM_NAME SALES
Item_name_1 5000
...
Item_name_x 3
What I want to get is get the items that represent the top 50% of sales. So for example, if total sales was 10,000, just item_name_1 alone would represent 50% of sales.
I can obviously get total sales with:
select sum(sales) from ITEM_SALES.
...and then divide by 2 to get how many sales 50% of sales is.
However, I don't know how I'd go from there to getting the top items that represent 50% of sales.

You can do this using analytic functions:
select s.*
from (select item_name, sum(sales) as sumsales,
sum(sum(sales)) over (order by sum(sales) desc) as cumesales,
sum(sum(sales)) over () as totsales,
from item_sales
group by item_name
) s
where (cumesales - sumsales) < 0.5 * totsales;
The subquery calculates the sales for each item, as well as two other values:
The cumulative sales, from highest to that item.
The total sales.
The where clause then gets items up to and include the one that passes the 50% threshold.

Oracle Setup:
CREATE TABLE ITEM_SALES ( ITEM_NAME, SALES ) AS
SELECT 'item_name_' || LEVEL, 50 - 3 * (LEVEL - 1)
FROM DUAL
CONNECT BY LEVEL <= 16;
Query:
SELECT *
FROM (
SELECT ITEM_NAME,
SALES,
SUM( SALES ) OVER ( ORDER BY SALES DESC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS cumulative_sales,
SUM( SALES ) OVER ( ORDER BY SALES DESC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) AS total_sales
FROM ITEM_SALES
)
WHERE cumulative_sales <= total_sales/2;
Results:
ITEM_NAME SALES CUMULATIVE_SALES TOTAL_SALES
------------ ----- ---------------- -----------
item_name_1 50 50 440
item_name_2 47 97 440
item_name_3 44 141 440
item_name_4 41 182 440
item_name_5 38 220 440

How can I sum cost items, grouped by invoice?

I have two SQL Server tables below:
Invoice
InvoiceId Amount [Date]
1 10 2015-05-28 21:47:50.000
2 20 2015-05-28 21:47:50.000
3 25 2015-05-28 23:25:50.000
InvoiceItem
Id InvoiceId Cost
1 1 8
2 1 3
3 1 7
4 2 15
5 2 17
6 3 20
7 3 22
Now I want to JOIN these two tables ON InvoiceId and retrieve the following:
COUNT of DISTINCT InvoiceId from Invoice table AS [Count]
SUM of Amount from Invoice table AS Amount
SUM of Cost from InvoiceItem table AS Cost
HOUR part of [Date]
and GROUP them BY HOUR part of [Date].
Desired Output wil be:
[Count] Amount Cost HourOfDay
2 30 50 22
1 25 42 23
How can I do this?

one approach is to use a derived table:
SELECT CAST([Date] AS DATE) AS [Date],
DATEPART(HOUR,i.[Date]) AS HourOfDay,
COUNT(i.InvoiceId) AS NumberOfInvoices,
SUM(i.Amount) AS Amount,
SUM(it.Cost) AS Cost
FROM invoice i
INNER JOIN
(SELECT InvoiceId, SUM(Cost) AS Cost
FROM invoiceitem
GROUP BY InvoiceId) it ON i.InvoiceId = it.InvoiceId
GROUP BY [Date],DATEPART(HOUR,i.[Date])
or a CTE (Common Table Expression)
WITH InvoiceCosts (InvoiceId, Cost)
AS
(
SELECT InvoiceId, SUM(Cost) AS Cost
FROM invoiceitem
GROUP BY InvoiceId
)
SELECT CAST([Date] AS DATE) AS [Date],
DATEPART(HOUR,i.[Date]) AS HourOfDay,
COUNT(i.InvoiceId) AS NumberOfInvoices,
SUM(i.Amount) AS Amount,
SUM(ic.Cost) AS Cost
FROM invoice i
INNER JOIN
InvoiceCosts ic ON i.InvoiceId = ic.InvoiceId
GROUP BY [Date],DATEPART(HOUR,i.[Date])

SELECT COUNT (DISTINCT inv.InvoiceId) [Count],
SUM (Amount) Amount,
SUM (Cost) Cost,
datepart(HOUR, inv.[Date]) HourOfDay
FROM Invoice inv
INNER JOIN InvoiceItem itm
ON inv.InvoiceId = itm.InvoiceId
GROUP BY datepart(HOUR, inv.[Date]);

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SELECT TOP N with distinct/unique field values - sql

Related

How to do a Min and Max of date but following the changes in price points

calculating percentage of sales profit in SQL

T-SQL calculate the percent increase or decrease between the earliest and latest for each project

SQL -- Get items responsible for top 50% of sales

How can I sum cost items, grouped by invoice?

Categories

Resources