How to do a complex calculation as this sample - sql

In the stored procedure (I'm using SQL server2008), I'm having a business like this sample:
ID City Price Sold
1 A 10 3
1 B 10 5
1 A 10 1
1 B 10 3
1 C 10 5
1 C 10 2
2 A 10 1
2 B 10 6
2 A 10 3
2 B 10 4
2 C 10 3
2 C 10 4
What I want to do is:
with each ID, sort by City first.
After sort, for each row of this ID, re-calculate Sold from top to bottom with condition: total of Sold for each ID does not exceed Price (as the result below).
And the result like this:
ID City Price Sold_Calculated
1 A 10 3
1 A 10 1
1 B 10 5
1 B 10 1 (the last one equal '1': Total of Sold = Price)
1 C 10 0 (begin from this row, Sold = 0)
1 C 10 0
2 A 10 1
2 A 10 3
2 B 10 6
2 B 10 0 (begin from this row, Sold = 0)
2 C 10 0
2 C 10 0
And now, I'm using the Cursor to do this task: Get each ID, sort City, calculate Sold then, and save to temp table. After finish calculating, union all temp tables. But it take a long time.
What I know people advise is, DO NOT use Cursor.
So, with this task, can you give me the example (with using select form where group) to finish? or do we have other ways to solve it quickly?
I understand this task is not easy for you, but I still post here, hope that there is someone helps me to go through.
I'm very appriciated for your help.
Thanks.

In order to accomplish your task you'll need to calculate a running sum and use a case statement
Previously I used a JOIN to do the running sum and Lag with the case statement
However using a recursive Cte to calculate the running total as described here by Aaron Bertand, and the case statement by Andriy M we can construct the following, which should offer the best performance and doesn't need to "peek at the previous row"
WITH cte
AS (SELECT Row_number()
OVER ( partition BY id ORDER BY id, city, sold DESC) RN,
id,
city,
price,
sold
FROM table1),
rcte
AS (
--Anchor
SELECT rn,
id,
city,
price,
sold,
runningTotal = sold
FROM cte
WHERE rn = 1
--Recursion
UNION ALL
SELECT cte.rn,
cte.id,
cte.city,
cte.price,
cte.sold,
rcte.runningtotal + cte.sold
FROM cte
INNER JOIN rcte
ON cte.id = rcte.id
AND cte.rn = rcte.rn + 1)
SELECT id,
city,
price,
sold,
runningtotal,
rn,
CASE
WHEN runningtotal <= price THEN sold
WHEN runningtotal > price
AND runningtotal < price + sold THEN price + sold - runningtotal
ELSE 0
END Sold_Calculated
FROM rcte
ORDER BY id,
rn;
DEMO

As #Gordon Linoff commented, the order of sort is not clear from the question. For the purpose of this answer, I have assumed the sort order as city, sold.
select id, city, price, sold, running_sum,
lag_running_sum,
case when running_sum <= price then Sold
when running_sum > price and price > coalesce(lag_running_sum,0) then price - coalesce(lag_running_sum,0)
else 0
end calculated_sold
from
(
select id, city, price, sold,
sum(sold) over (partition by id order by city, sold
rows between unbounded preceding and current row) running_sum,
sum(sold) over (partition by id order by city, sold
rows between unbounded preceding and 1 preceding) lag_running_sum
from n_test
) n_test_running
order by id, city, sold;
Here is the demo for Oracle.
Let me break down the query.
I have used SUM as analytical function to calculate the running sum.
The first SUM, groups the rows based on id, and in each group orders the row by city and sold.
The rows between clause tell which rows to be considered for adding up. Here i have specified it to add
current row and all other rows above it. This gives the running sum.
The second one does the same thing except for, the current row is excluded from adding up. This
essentially creates a running sum but lagging the previous sum by one row.
Using this result as inline view, the outer select makes use of CASE statement to determine the
value of new column.
As long as the running sum is less than or equal to price it gives sold.
If it crosses the price, the value is adjusted so that sum becomes equal to price.
For the rest of the rows below it, value is set as 0.
Hope my explanation is quite clear.

To me, it sounds like you could use window functions in a case like this. Is this applicable?
Although in my case your end result would possibly look like:
ID City Price Sold_Calculated
2 A 10 4
2 B 10 6
2 C 10 0
Which could have an aggregation like
SUM(Sold_Calculated) OVER (PARTITION BY ID, City, Price, Sold_Calculated)
depending on how far down you want to go.. You could even use a case statement if need be

Are you looking to do this entirely in SQL? A simple approach would be this:
SELECT C.ID,
C.City,
C.Price,
calculate_Sold_Function(C.ID, C.Price) AS C.Sold_Calculated
FROM CITY_TABLE C
GROUP BY C.City
Where calculate_Sold_Function is a T-SQL/MySQL/etc function taking the ID and Price as parameters. No idea how you plan on calculating price.

Related

SQL Distinct / GroupBy

Ok, I’m stuck on an SQL query and tried long enough that it’s time to ask for help :) I'm using Objection.js – but that's not super relevant as I really just can't figure out how to structure the SQL.
I have the following example data set:
Items
id
name
1
Test 1
2
Test 2
3
Test 3
Listings
id
item_id
price
created_at
1
1
100
1654640000
2
1
60
1654640001
3
1
80
1654640002
4
2
90
1654640003
5
2
90
1654640004
6
3
50
1654640005
What I’m trying to do:
Return the lowest priced listing for each item
If all listings for an item have the same price, I want to return the newest of the two items
Overall, I want to return the resulting items by price
I’m trying to write a query that returns the data:
id
item_id
name
price
created_at
6
3
Test 3
50
1654640005
2
1
Test 1
60
1654640001
5
2
Test 2
90
1654640004
Any help would be greatly appreciated! I'm also starting fresh, so I can add new columns to the data if that would help at all :)
An example of where my query is right now:
select * from "listings" inner join (select "item_id", MIN(price) as "min_price" from "listings" group by "item_id") as "grouped_listings" on "listings"."item_id" = "grouped_listings"."item_id" and "listings"."price" = "grouped_listings"."min_price" where "listings"."sold_at" is null and "listings"."expires_at" > ? order by CAST(price AS DECIMAL) ASC limit ?;
This gets me listings – but if two listings have the same price, it returns multiple listings with the same item_id – not ideal.
Given the postgresql tag, this should work:
with listings_numbered as (
select *, row_number() over (
partition by item_id
order by price asc, created_at desc
) as rownum
from listings
)
select l.id, l.item_id, i.name, l.price, l.created_at
from listings_numbered l
join items i on l.item_id=i.id
where l.rownum=1
order by price asc;
This is a bit of an advanced query, using window functions and a common table expression, but we can break it down.
with listings_numbered as (...) select simply means to run the query inside of the ..., and then we can refer to the results of that query as listings_numbered inside of the select, as though it was a table.
We're selecting all of the columns in listings, plus one more:
row_number() over (partition by item_id order by price asc, created_at desc). partition by item_id means that we would like the row number to reset for each new item_id, and the order by specifies the ordering that the rows should get within each partition before we number them: first increasing by price, then decreasing by creation time to break ties.
The result of the CTE listings_numbered looks like:
id
item_id
price
created_at
rownum
2
1
60
1654640001
1
3
1
80
1654640002
2
1
1
100
1654640000
3
5
2
90
1654640004
1
4
2
90
1654640003
2
6
3
50
1654640005
1
If you look at only the rows where rownum (the last column) is 1, then you can see that it's exactly the set of listings that you're interested in.
The outer query then selects from this this dataset, joins on items to get the name, filters to only the listings where rownum is 1, and sorts by price, to get the final result:
id
item_id
name
price
created_at
6
3
Test 3
50
1654640005
2
1
Test 1
60
1654640001
5
2
Test 2
90
1654640004
Aggregation functions, as the MIN function you employed in your query, is a viable option, yet if you want to have an efficient query for your problem, window functions can be your best friends. This class of functions allow to compute values over "windows" (partitions) of your table given some specified columns.
For the solution to this problem I'm going to compute two values using the window functions:
the minimum value for "listings.price", by partitioning on "listings.item_id",
the maximum value for "created_at", by partitioning on "listings.item_id" and listings.price
SELECT *,
MIN(price) OVER(PARTITION BY item_id) AS min_price,
MAX(created_at) OVER(PARTITION BY item_id, price) AS max_created_at
FROM listings
Once you have all records of listings associated to the corresponding minimum price and latest date, it's necessary for you to select the records whose
price equals the minimum price
created_at equals the most recent created_at
WITH cte AS (
SELECT *,
MIN(price) OVER(PARTITION BY item_id) AS min_price,
MAX(created_at) OVER(PARTITION BY item_id, price) AS max_created_at
FROM listings
)
SELECT id,
item_id,
price,
created_at
FROM cte
WHERE price = min_price
AND created_at = max_created_at
If you need to order by price, it's sufficient to add a ORDER BY price clause.
Check the demo here.

SQL how to find the product group with highest porportion of expensive items

I have a table with products, product_group and price level. How can I find the product group with the highest porportion of expensive items?
product | product_group | price_level
1 a expensive
2 a low
3 b low
4 b expensive
5 b expensive
6 c expensive
I have tried this query, but it keeps all price_levels, not just the expensive ones.
select product, product_group, price_level,
count(price_level) over (partition by product_group, price_level) as pl,
count(product) over (partition by product_group) as p
from tbl
Essentially, I want to divide the number of expensive items in one product group by the total number of items in the same product group.
Desired output:
Product group | Percentage
c 1
You can use conditional aggregation:
select product_group,
avg( (price_level = 'expensive')::int ) as expensive_ratio
from tbl
group by product_group
order by expensive_ratio desc
limit 1;
The use of avg() is a convenient way to get the ratio you want. A more verbose method would be:
count(*) filter (where price_level = 'expensive') * 1.0 / count(*)

SQL Group based on individual row chaining to another individual row

I don't quite know how to ask this question better than this. Effectively I have a transaction table. This table per customer has 1 to many rows of transactions for that customers. Per row, it marks the customer ID of the previous customer that occurred before it. For example:
Cust_ID Tran_Type Prev_ID
10 A 9
10 B 9
9 T 7
9 A 7
8 B ~
8 A ~
7 T ~
In this example, cust 7 is the starting customer for the day for an individual using this program. They then started working on customer 9 and then finally customer 10. In addition, for another individual they started with customer 8 and didn't do another transaction the entire day. The two groups I'd expect is group A which is comprised of customer 7, 9, 10 and group B comprised of customer 8 only.
I'm honestly stumped on this one. Does anyone have any advice? I'm fairly certain I want to start by grouping on the unique customer ID's and previous ID's which will give me:
Cust_ID Prev_ID
10 9
9 7
8 ~
7 ~
At this point though I'm not sure how else to do it using vanilla sql. Thanks.
should just be a group by
select
custid, prev_id
from transactiontable
group by custid,previd
You are right, you'd start with the distinct rows. Then recursively go up from the records without previous transactions.
with pairs as
(
select distinct cust_id, prev_id from transactions
)
, groups (cust_id, prev_id, grp, pos) as
(
select cust_id, prev_id, row_number() over (order by cust_id), 1
from pairs
where prev_id is null
union all
select p.cust_id, p.prev_id, g.grp, g.pos + 1
from pairs p
join groups g on g.cust_id = p.prev_id
)
select cust_id, prev_id, grp
from groups
order by grp, pos;
REXTESTER demo: http://rextester.com/NZGLU84962

SQL Server 2008 - ROWNUMBER OVER - filtering the result

I have the following SQL which works and returns products with duplicate names and the rownum column is a count of how many times that name appears.
Adding where rownum > 1 at the end gives me the duplicates only.
SELECT *
FROM
(SELECT
id, productname,
ROW_NUMBER() OVER (PARTITION BY productname
ORDER BY productname) Rownum
FROM products
GROUP BY id, productname) result
REQUIREMENT
I need to produce a list of products where if the rownum column has a value greater than one, I want to see all the rows pertaining to that product grouped by the name column.
If the rownum value for a product is 1 only, and no value greater than one (so no duplicate) I don't want to see that row.
So for example if "Blue umbrella" appears three times, I want to see the result for this product as:
ID Name Rownum
35 Blue umbrella 1
41 Blue umbrella 2
90 Blue umbrella 3
How would I go about achieving this please?
Change the Row_NUmber Over to Count(1) Over and select where the count is greater than 1 and remove the group by
SELECT * from (Select id,productname,
Count(1) OVER(Partition By productname ORDER by productname) Rownum
FROM products
) result
WHERE Rownum > 1

Get every second row as a result table in t-sql

I'm looking for a t-sql script that returns a list, that shows every second value from a grouping from Table1.
For example I have the following data (Table1) and want the desired result-list:
Table1:
Customer Quantity
A 5
A 8 (*)
B 3
B 5 (*)
B 11
C 7
D 4
D 23 (*)
Desired retult-list:
Customer Quantity
A 8
B 5
D 23
I think about doing something something with 'select distinct and left outer join', but I can't get it to work. Possibly I need an row numbering, but can't figure out how to do it. Anyone can help me?
Beneath is the script I used to make and fill Table1:
CREATE TABLE Table1
(Customer nvarchar(1) NULL,
Quantity int NOT NULL);
INSERT INTO Table1(Customer,Quantity)
VALUES
('A',5),
('A',8),
('B',3),
('B',5),
('B',11),
('C',7),
('D',4),
('D',23);
This can be done quite easily using the row_number window function:
SELECT customer, quantity
FROM (SELECT customer, quantity,
ROW_NUMBER() OVER (PARTITION BY customer
ORDER BY quantity ASC) AS rn
FROM table1) t
WHERE rn = 2
You can use ROW_NUMBER and a CTE:
WITH data AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY Customer ORDER BY Quantity) rn
FROM #Table1
)
SELECT Customer, Quantity
FROM data
WHERE rn = 2
How it works:
Using ROW_NUMBER() will assign a sequential number to each row based on what's specified in OVER (). In OVER i specify to PARTITION the rows on customer, that means each group of data on same customer will be numberered separately. Then ORDER BY Quantity mean it should order the data based on quantity for each customer - so i can get the 2nd row for each customer ordered by quantity.