"Partitioned" sorting in a SQL query

"Partitioned" sorting in a SQL query - sql

The following SQL query that displays products sold sorted by cost and number of orders have to be sorted in a partitioned manner. Namely, products with the cost of under $100 should go first and then everything else that is > $100 should follow it. Adding HAVING TS.TotalSold < 100 to the query would accomplish this for the first partition, but would filter out other products. The operation should be atomic, so that this query can be executed only once.
NOTE: cost by which the query has to be partitioned is calculated as a max of two cost columns, which makes things a bit more complicated (the proposed solutions of CASE WHEN won't work as HighestCost is not a column)
SELECT PS.ProductName, TS.TotalSold,
((PS.Cost1 + PS.Cost2 + ABS(PS.Cost1-PS.Cost2)) / 2) as HighestCost
FROM Products as PS
CROSS APPLY
(SELECT
(SELECT COUNT(OrderId)
FROM Orders as OS
WHERE OS.ProductId=PS.ProductId)
as TotalSold) TS
ORDER BY HighestCost ASC, TS.TotalSold
EDIT: modified the query to include calculated cost by which the query has to be partitioned.

EDITED
SELECT *
FROM
(
SELECT PS.ProductName, TS.TotalSold,
((PS.Cost1 + PS.Cost2 + ABS(PS.Cost1-PS.Cost2)) / 2) as HighestCost
FROM Products as PS
CROSS APPLY
(SELECT COUNT(OrderId) as TotalSold
FROM Orders as OS
WHERE OS.ProductId=PS.ProductId) TS
) SQ
ORDER BY CASE WHEN HighestCost > 100 THEN 1 END ASC, TotalSold
original below
SELECT PS.ProductName, TS.TotalSold
FROM Products as PS
CROSS APPLY
(SELECT COUNT(OrderId) as TotalSold
FROM Orders as OS
WHERE OS.ProductId=PS.ProductId) TS
ORDER BY
CASE WHEN TS.TotalSold > 100 THEN 1 END, PS.Cost ASC, TS.TotalSold
You may notice I removed a subquery level since it was extraneous.

I don't know which dbms you are using but in mine I would use a calculated column to assign a partitionId, and sort by that. Something like this:
SELECT PS.ProductName, TS.TotalSold,
(if cost < 100 then 1 else 2 endif) as partition
FROM Products as PS
CROSS APPLY
(SELECT
(SELECT COUNT(OrderId)
FROM Orders as OS
WHERE OS.ProductId=PS.ProductId)
as TotalSold) TS
ORDER BY partition, PS.Cost ASC, TS.TotalSold

Related

Only get rows until qty total is met

I have an order table (product, qty_required) and a stock/bin location table (product, bin_location, qty_free) which is a one->many (a product may be stored in multiple bins).
Please, please (pretty please!) Does anybody know how to:
When producing a picking report, I only want to return the first x bins for each product ordered THAT SATISFIES the qty_required on the order.
For example
An order requires product 'ABC', QTY 10
Product 'ABC' is in the following locations (this is listed using FIFO rules so oldest first):
LOC1, 3 free
LOC2, 4 free
LOC3, 6 free
LOC4, 18 free
LOC5, 2 free
so. on the report, I'd ONLY want to see the first 3 locations, as the total of those (13) satisfies the order quantity of 10...
Ie:
LOC1, 3
LOC2, 4
LOC3, 6

Use sum(qty_free) over(partition by product order by placement_date desc, bin_location) to calculate running sum and filter rows by your threshold in outer query (select from select). Added location in order by to exclude sum of all locations where placement was in the same day.
with s as (
select st.*,
sum(qty_free) over(partition by product order by placement_date asc, bin_location) as rsum
from stock st
)
select
o.product,
s.bin_location,
s.qty_free,
o.qty_requested
from orders o
left join s
on o.product = s.product
and s.rsum <= o.qty_requested
UPD: Since that turned out that your SQL Server version is so old that there's no analytic function in it, here's another less performant way to do this (maybe need some fixes, didn't tested on real data).
And fiddle with some setup.
with ord_key as (
select stock.*,
/*Generate order key for FIFO*/
row_number() over(order by
placement_date desc,
bin_location asc
) as sort_order_key
from stock
)
, rsum as (
/*Calculate sum of all the items before current*/
select
b.product,
b.bin_location,
b.placement_date,
b.qty_free,
coalesce(sum(sub.item_sum), 0) as rsum
from ord_key as b
left join (
/*Group by partition key and orderby key*/
select
product,
sort_order_key,
sum(qty_free) as item_sum
from ord_key
group by
product,
sort_order_key
) as sub
on b.product = sub.product
and b.sort_order_key > sub.sort_order_key
group by
b.product,
b.bin_location,
b.placement_date,
b.qty_free
)
, calc_quantities as (
select
o.product,
s.placement_date,
s.bin_location,
s.qty_free,
s.rsum,
o.qty_requested,
case
when o.qty_requested > s.rsum + s.qty_free
then s.qty_free
else s.rsum + s.qty_free - o.qty_requested
end as qty_to_retrieve
from orders o
left join rsum s
on o.product = s.product
and s.rsum < o.qty_requested
)
select
s.*,
qty_free - qty_to_retrieve as stock_left
from calc_quantities s
order by
product,
placement_date desc,
bin_location desc

SELECT TOP 10 rows

I have built an SQL Query that returns me the top 10 customers which have the highest outstanding. The oustanding is on product level (each product has its own outstanding).
Untill now everything works fine, my only problem is that if a certain customer has more then 1 product then the second product or more should be categorized under the same customer_id like in the second picture (because the first product that has the highest outstanding contagions the second product that may have a lower outstanding that the other 9 clients of top 10).
How can I modify my query in order to do that? Is it possible in SQL Server 2012?
My query is:
select top 10 CUSTOMER_ID
,S90T01_GROSS_EXPOSURE_THSD_EUR
,S90T01_COGNOS_PROD_NAME
,S90T01_DPD_C
,PREVIOUS_BUCKET_DPD_REP
,S90T01_BUCKET_DPD_REP
from [dbo].[DM_07MONTHLY_DATA]
where S90T01_CLIENT_SEGMENT = 'PI'
and YYYY_MM = '2017_01'
group by CUSTOMER_ID
,S90T01_GROSS_EXPOSURE_THSD_EUR
,S90T01_COGNOS_PROD_NAME
,S90T01_DPD_C
,PREVIOUS_BUCKET_DPD_REP
,S90T01_BUCKET_DPD_REP
order by S90T01_GROSS_EXPOSURE_THSD_EUR desc;

You need to calculate the top Customers first, then pull out all their products. You can do this with a Common Table Expression.
As you haven't provided any test data this is untested, but I think it will work for you:
with top10 as
(
select top 10 CUSTOMER_ID
,sum(S90T01_GROSS_EXPOSURE_THSD_EUR) as TOTAL_S90T01_GROSS_EXPOSURE_THSD_EUR
from [dbo].[DM_07MONTHLY_DATA]
where S90T01_CLIENT_SEGMENT = 'PI'
and YYYY_MM = '2017_01'
group by CUSTOMER_ID
order by TOTAL_S90T01_GROSS_EXPOSURE_THSD_EUR desc
)
select m.CUSTOMER_ID
,m.S90T01_GROSS_EXPOSURE_THSD_EUR
,m.S90T01_COGNOS_PROD_NAME
,m.S90T01_DPD_C
,m.PREVIOUS_BUCKET_DPD_REP
,m.S90T01_BUCKET_DPD_REP
from [dbo].[DM_07MONTHLY_DATA] m
join top10 t
on m.CUSTOMER_ID = t.CUSTOMER_ID
order by t.TOTAL_S90T01_GROSS_EXPOSURE_THSD_EUR desc
,m.S90T01_GROSS_EXPOSURE_THSD_EUR;

SQL Server get customer with 7 consecutive transactions

I am trying to write a query that would get the customers with 7 consecutive transactions given a list of CustomerKeys.
I am currently doing a self join on Customer fact table that has 700 Million records in SQL Server 2008.
This is is what I came up with but its taking a long time to run. I have an clustered index as (CustomerKey, TranDateKey)
SELECT
ct1.CustomerKey,ct1.TranDateKey
FROM
CustomerTransactionFact ct1
INNER JOIN
#CRTCustomerList dl ON ct1.CustomerKey = dl.CustomerKey --temp table with customer list
INNER JOIN
dbo.CustomerTransactionFact ct2 ON ct1.CustomerKey = ct2.CustomerKey -- Same Customer
AND ct2.TranDateKey >= ct1.TranDateKey
AND ct2.TranDateKey <= CONVERT(VARCHAR(8), (dateadd(d, 6, ct1.TranDateTime), 112) -- Consecutive Transactions in the last 7 days
WHERE
ct1.LogID >= 82800000
AND ct2.LogID >= 82800000
AND ct1.TranDateKey between dl.BeginTranDateKey and dl.EndTranDateKey
AND ct2.TranDateKey between dl.BeginTranDateKey and dl.EndTranDateKey
GROUP BY
ct1.CustomerKey,ct1.TranDateKey
HAVING
COUNT(*) = 7
Please help make it more efficient. Is there a better way to write this query in 2008?

You can do this using window functions, which should be much faster. Assuming that TranDateKey is a number and you can subtract a sequential number from it, then the difference constant for consecutive days.
You can put this in a query like this:
SELECT CustomerKey, MIN(TranDateKey), MAX(TranDateKey)
FROM (SELECT ct.CustomerKey, ct.TranDateKey,
(ct.TranDateKey -
DENSE_RANK() OVER (PARTITION BY ct.CustomerKey, ct.TranDateKey)
) as grp
FROM CustomerTransactionFact ct INNER JOIN
#CRTCustomerList dl
ON ct.CustomerKey = dl.CustomerKey
) t
GROUP BY CustomerKey, grp
HAVING COUNT(*) = 7;
If your date key is something else, there is probably a way to modify the query to handle that, but you might have to join to the dimension table.

This would be a perfect task for a COUNT(*) OVER (RANGE ...), but SQL Server 2008 supports only a limited syntax for Windowed Aggregate Functions.
SELECT CustomerKey, MIN(TranDateKey), COUNT(*)
FROM
(
SELECT CustomerKey, TranDateKey,
dateadd(d,-ROW_NUMBER()
OVER (PARTITION BY CustomerKey
ORDER BY TranDateKey),TranDateTime) AS dummyDate
FROM CustomerTransactionFact
) AS dt
GROUP BY CustomerKey, dummyDate
HAVING COUNT(*) >= 7
The dateadd calculates the difference between the current TranDateTime and a Row_Number over all date per customer. The resulting dummyDatehas no actual meaning, but is the same meaningless date for consecutive dates.

Cumulative substract throught multiple rows

I would like to substract one row from multiple rows. I need to get remaining Quantity (differentiated by BusTransaction_ID and Artikl, and ordered by X_PDateMonth$DATE), which is result of this substract:
And expected results:
Result can be with or without "zero rows". I don't know, how to accomplish this result. And will be better use some "stored procedure" or something, because it will be use to a pretty large data set?
Thanks for all replies.

Here is a solution that works by doing the following:
Calculates the cumulative sums of the values in the first table.
Based on the cumulative sum, determines the value to subtract.
The query looks like this:
select t.bustransaction_id, t.artikl, t.xpldate,
(case when cumeq <= subt.quantity then 0
when cumeq - t.quantity <= subt.quantity
then cumeq - subt.quantity
else t.quantity
end) as newquantity
from (select t.*,
sum(quantity) over (partition by bustransaction_id, artikl order by xpldate) as cumeq
from start_table t
) t left join
subtract_table subt
on t.bustransaction_id = subt.bustransaction_id and
t.artikl = subt.artikl
order by t.bustransaction_id, t.artikl, t.xpldate;
Here is the SQL Fiddle (based on Brians).

This will give you the result with the 'zero rows' using analytic functions:
select x.*,
case
when subqty >= runner
then 0
when runner > subqty
and lag(runner, 1) over( partition by bustransaction_id, artikl
order by bustransaction_id, artikl, xpldate ) > subqty
then quantity
else runner - subqty
end as chk
from (select s.bustransaction_id,
s.artikl,
s.xpldate,
s.quantity,
sum(s.quantity) over( partition by s.bustransaction_id, s.artikl
order by s.bustransaction_id, s.artikl, s.xpldate ) as runner,
z.quantity as subqty
from start_table s
join subtract_table z
on s.bustransaction_id = z.bustransaction_id
and s.artikl = z.artikl) x
order by bustransaction_id, artikl, xpldate
Fiddle: http://sqlfiddle.com/#!6/20987/1/0
The CASE statement combined with the LAG function is what identifies the first "half-depleted" row, which is the biggest piece of your calculation.
In that fiddle I included my derived columns that were necessary to get what you wanted. If you don't want to show those columns you can just select those you need from the inline view, as shown here: http://sqlfiddle.com/#!6/20987/2/0

SQL to produce Top 10 and Other

Imagine I have a table showing the sales of Acme Widgets, and where they were sold. It's fairly easy to produce a report grouping sales by country. It's fairly easy to find the top 10. But what I'd like is to show the top 10, and then have a final row saying Other. E.g.,
Ctry | Sales
=============
GB | 100
US | 80
ES | 60
...
IT | 10
Other | 50
I've been searching for ages but can't seem to find any help which takes me beyond the standard top 10.
TIA

I tried some of the other solutions here, however they seem to be either slightly off, or the ordering wasn't quite right.
My attempt at a Microsoft SQL Server solution appears to work correctly:
SELECT Ctry, Sales FROM
(
SELECT TOP 2
Ctry,
SUM(Sales) AS Sales
FROM
Table1
GROUP BY
Ctry
ORDER BY
Sales DESC
) AS Q1
UNION ALL
SELECT
Ctry AS 'Other',
SUM(Sales) AS Sales
FROM
Table1
WHERE
Ctry NOT IN (SELECT TOP 2
Ctry
FROM
Table1
GROUP BY
Ctry
ORDER BY
SUM(Sales) DESC)
Note that in my example, I'm only using TOP 2 rather than TOP 10. This is simply due to my test data being rather more limited. You can easily substitute the 2 for a 10 in your own data.
Here's the SQL Script to create the table:
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_PADDING ON
GO
CREATE TABLE [dbo].[Table1](
[Ctry] [varchar](50) NOT NULL,
[Sales] [float] NOT NULL
) ON [PRIMARY]
GO
SET ANSI_PADDING OFF
And my data looks like this:
GB 10
GB 21.2
GB 34
GB 16.75
US 10
US 11
US 56.43
FR 18.54
FR 98.58
WE 44.33
WE 11.54
WE 89.21
KR 10
PO 10
DE 10
Note that the query result is correctly ordered by the Sales value aggregate and not the alphabetic country code, and that the "Other" category is always last, even if it's Sales value aggregate would ordinarily push it to the top of the list.
I'm not saying this is the best (read: most optimal) solution, however, for the dataset that I provided it seems to work pretty well.

SELECT Ctry, sum(Sales) Sales
FROM (SELECT COALESCE(T2.Ctry, 'OTHER') Ctry, T1.Sales
FROM (SELECT Ctry, sum(Sales) Sales
FROM Table1
GROUP BY Ctry) T1
LEFT JOIN
(SELECT TOP 10 Ctry, sum(sales) Sales
FROM Table1
GROUP BY Ctry) T2
on T1.Ctry = T2.Ctry
) T
GROUP BY Ctry

The pure SQL solutions to this problem make multiple passes through the individual records more than once. The following solution only queries the data once, and uses a SQL ranking function, ROW_NUMBER() to determine if some results belong in the "Other" category. The ROW_NUMBER() function has been available in SQL Server since SQL Server 2008. In my database, this seems to have resulted in a more efficient query. Please note that the "Other" row will appear above some rows if the total of the "Other" sales exceeds the top 10. If this is not desired some adjustments would need to be made to this query:
SELECT CASE WHEN RowNumber > 10 THEN 'Other' ELSE Ctry END AS Ctry,
SUM(Sales) as Sales FROM
(
SELECT Ctry, SUM(Sales) as Sales,
ROW_NUMBER() OVER(ORDER BY SUM(Sales) DESC) AS RowNumber
FROM Table1 GROUP BY Ctry
) as AggregateQuery
GROUP BY CASE WHEN RowNumber > 10 THEN 'Other' ELSE Ctry END
ORDER BY SUM(Sales) DESC

Using a real analytics SQL engine, such as Apache Spark, you can use Common Table Expression with to do:
with t as (
select rank() over (order by sales desc) as r, sales,city
from DB
order by sales desc
)
select sales, city, r
from t where r <= 10
union
select sum(sales) as sales, "Other" as city, 11 as r
from t where r > 10

In pseudo SQL:
select top 10 order by sales
UNION
select 'Other',SUM(sales) where Ctry not in (select top 10 like above)

Union the top ten with an outer Join of the top ten with the table it self to aggregate the rest.
I don't have access to SQL here but I'll hazzard a guess:
select top (10) Ctry, sales from table1
union all
select 'other', sum(sales)
from table1
left outer join (select top (10) Ctry, sales from table1) as table2
on table2.Ctry = table2.Ctry
where table2.ctry = null
group by table1.Ctry
Of course if this is a rapidly changing top(10) then you either lock or maintain a copy of the top(10) for the duration of the query.

Have in mind that depending on your use (and database volume / restrictions) you can achieve the same results using application code (python, node, C#, java etc). Sure it will depend on your use-case but hey, it's possible.
I ended up doing this in C# for instance:
// Mockup Class that has a CATEGORY and it's VOLUME
class YourModel { string category; double volume; }
List<YourModel> groupedList = wholeList.Take (5).ToList ();
groupedList.Add (new YourModel()
{
category = "Others",
volume = tempChartData.Skip (5).Select (t => t.qtd).Sum ()
});
Disclaimer
I understand that this is a "SQL Only" tagged question, but there might be other people like me out there who can make use of the application layer instead of relying only on SQL to make it happen. I am just trying to show people other ways of doing the same thing, that might be helpful. Even if this gets downvoted to oblivion I know that someone will be happy to read this because they were taught to use each tool to it's best, and think "outside the box".

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

"Partitioned" sorting in a SQL query - sql

Related

Only get rows until qty total is met

SELECT TOP 10 rows

SQL Server get customer with 7 consecutive transactions

Cumulative substract throught multiple rows

SQL to produce Top 10 and Other

Categories

Resources