SQL split volume into buckets with no repetition of bucket ID - sql

Im solving maybe simple but for me impossible task in MS SQL. I have 2 tables- 1 with company name and total volume of used bottles. In second table I have a list of buckets with unique ID and their bottle capacity. My task is to assign to each company correct number of buckets (to cover all volume of bottles) while not using the same buckets twice (not repeat buckets with the same ID for 2 or moře companies).
Is anyone able to help me with that?
Thank you!

Given the additional simplifications that every bottle is the same and every bucket has the same capacity (per your comment), this will do the trick:
-- demo schema
create table companies (cname char, bottlesUsed int);
create table buckets (id int, capacity int);
-- demo data
insert companies values ('a', 41), ('b', 2), ('c', 5), ('d', 50);
insert buckets select top 20 row_number() over (order by object_id), 20 from sys.objects;
with
bucketnums as
(
select i = row_number() over (order by id),
id
from buckets
),
bucketRanges as
(
select cname,
firstBucketNum = 1 + lag(lastBucketNum, 1, 0) over (order by cname),
lastBucketNum
from ( -- running total of bucket count required by each customer
select cname,
lastBucketNum = sum(ceiling(bottlesUsed * 1.0 / 20))
over (order by cname rows unbounded preceding)
from companies
) t
)
select conmpanyName = br.cname,
allocatedBucketId = bn.id
from bucketRanges br
join bucketnums bn on bn.i between firstBucketNum and lastBucketNum;
If the bottle size or bucket capacity is variable, this problem becomes much more... "interesting" :)

Related

How to efficiently compute in step 1 differences between columns and in step 2 aggregate those differences?

This is a follow-up question to this Is there a way to use functions that can take multiple columns as input (e.g. `GREATEST`) without typing out every column name?, where I asked only about the second part of my problem. The feedback was that the data model is most likely not appropriate.
I was thinking again about the data model but still, have trouble figuring out a good way to do what I want.
The complete problem is as follows:
I got time series data for multiple technical devices with columns like energy_consumption and voltage.
Furthermore, I got columns with sensitivities towards multiple external factors for each device which I just added as additional columns (denoted with the cc_ in the example).
There are queries where I want to operate on the raw sensitivities. However, there are also queries for which I need to take first some differences such as cc_a - cc_b and cc_b -cc_c and then compute the max of those differences. The combinations for which the differences are to be computed are a predefined subset (around 30) of all possible combinations. The set of combinations that is of interest might change in the future so that for different time intervals different combinations have to be applied (e.g. from 2022-01-01 to 2024-12-31 take combination set A and from 2025-01-01 to ... take combination set B). However, it is very unlikely that the combination change very often.
Here is an example of how I am doing it at the moment
CREATE TEMP TABLE foo (device_id int, voltage int, energy_consumption int, cc_a int, cc_b int, cc_c int);
INSERT INTO foo VALUES (3, 12, 5, '1', '2', '3'), (4, 6, 3, '15', '4', '100');
WITH diff_table AS (
SELECT
id,
(cc_a - cc_b) as diff_ab,
(cc_a - cc_c) as diff_ac,
(cc_b - cc_c) as diff_bc
FROM foo
)
SELECT
id,
GREATEST(diff_ab, diff_ab, diff_bc) as max_cc
FROM diff_table
Since I got more than 100 sensitivities and also differences I am looking for a way how to do this efficiently, both computationally and in terms of typical query length.
What would be a good data model to perform such operations?
The solution I type below assumes all pairings are considered, and the you don't want the points where these are reached.
CREATE TABLE sources (
source_id int
,source_name varchar(10)
,PRIMARY KEY(source_id))
CREATE TABLE foo_values(
device_id int not null --device_id for "foo"
,source_id int -- you may change that with a foreign key
,value int
,CONSTRAINT fk_source_id
FOREIGN KEY(source_id )
REFERENCES sources(source_id ) )
With the exemple set you gave
INSERT INTO sources ( source_id, source_name ) VALUES
(1,'cc_a')
,(2,'cc_b')
,(3,'cc_c')
-- and so on
INSERT INTO foo_values (device_id,source_id, value ) VALUES
(3,1,1),(3,2,2),(3,3,3)
,(4,1,15),(4,2,4),(4,2,100)
doing this way, the query will be
SELECT device_id
, MAX(value)-MIN(value) as greatest_diff
FROM foo_values
group by device_id
Bonus : with such a schema, you can even tell where the maximum and minimum are reached
WITH ranked as (
SELECT
f.device_id
,f.value
,f.source_id
,RANK() OVER (PARTITION BY f.device_id ORDER BY f.value ) as low_first
,RANK() OVER (PARTITION BY f.device_id ORDER BY f.value DESC) as high_first
FROM foo_values as f)
SELECT h.device_id
, hs.source_name as source_high
, ls.source_name as source_low
, h.value as value_high
, l.value as value_low
, h.value - l.value as greatest_diff
FROM ranked l
INNER JOIN ranked h
on l.device_id = h.device_id
INNER JOIN sources ls
on ls.source_id = l.source_id
INNER JOIN sources hs
on hs.source_id = h.source_id
WHERE l.low_first =1 AND h.high_first = 1
Here is a fiddle for this solution.
EDIT : since you need to control the pairings, you must add a table that list them:
CREATE TABLE high_low_source
(high_source_id int
,low_source_id int
, constraint fk_low
FOREIGN KEY(low_source_id )
REFERENCES sources(source_id )
,constraint fk_high
FOREIGN KEY(high_source_id )
REFERENCES sources(source_id )
);
INSERT INTO high_low_source VALUES
(1,2),(1,3),(2,3)
The query looking for the greatest difference becomes :
SELECT h.device_id
, hs.source_name as source_high
, ls.source_name as source_low
, h.value as value_high
, l.value as value_low
, h.value - l.value as my_diff
, RANK() OVER (PARTITION BY h.device_id ORDER BY (h.value - l.value) DESC) as greatest_first
FROM foo_values l
INNER JOIN foo_values h
on l.device_id = h.device_id
INNER JOIN high_low_source hl
on hl.low_source_id = l.source_id
AND hl.high_source_id = h.source_id
INNER JOIN sources ls
on ls.source_id = l.source_id
INNER JOIN sources hs
on hs.source_id = h.source_id
ORDER BY device_id, greatest_first
With the values you have listed, there is a tie for device 3.
Extended fiddle

SQL aliasing with FROM AS

SELECT A.barName AS BarName1, B.barName AS BarName2
FROM (
SELECT Sells.barName, COUNT(barName) AS count
FROM Sells
GROUP BY barName
) AS A, B
WHERE A.count = B.count
I'm trying to do a self join on this table that I created, but I'm not sure how to alias the table twice in this format (i.e. FROM AS). Unfortunately, this is a school assignment where I can't create any new tables. Anyone have experience with this syntax?
edit: For clarification I'm using PostgreSQL 8.4. The schema for the tables I'm dealing with are as follows:
Drinkers(name, addr, hobby, frequent)
Bars(name, addr, owner)
Beers(name, brewer, alcohol)
Drinks(drinkerName, drinkerAddr, beerName, rating)
Sells(barName, beerName, price, discount)
Favorites(drinkerName, drinkerAddr, barName, beerName, season)
Again, this is for a school assignment, so I'm given read-only access to the above tables.
What I'm trying to find is pairs of bars (Name1, Name2) that sell the same set of drinks. My thinking in doing the above was to try and find pairs of bars that sell the same number of drinks, then list the names and drinks side by side (BarName1, Drink1, BarName2, Drink2) to try and compare if they are indeed the same set.
You have not mentioned what RDBMS you use.
If Oracle or MS SQL, you can do something like this (I use my sample data table, but you can try it with your tables):
create table some_data (
parent_id int,
id int,
name varchar(10)
);
insert into some_data values(1, 2, 'val1');
insert into some_data values(2, 3, 'val2');
insert into some_data values(3, 4, 'val3');
with data as (
select * from some_data
)
select *
from data d1
left join data d2 on d1.parent_id = d2.id
In your case this query
SELECT Sells.barName, COUNT(barName) AS count
FROM Sells
GROUP BY barName
should be placed in WITH section and referenced from main query 2 times as A and B.
It is slightly unclear what you are trying to achive. Are you looking for a list bar names, with how many times they appear in the table? If so, there are a couple ways you could do this. Firstly:
SELECT SellsA.barName AS BarName1, SellsB.count AS Count
FROM
(
SELECT DISTINCT barName
FROM Sells
) SellsA
LEFT JOIN
(
SELECT Sells.barName, COUNT(barName) AS count
FROM Sells
GROUP BY barName
) AS SellsB
ON SellsA.barName = SellsB.barName
Secondly, if you are using MSSQL:
SELECT barNamr, MAX(rn) AS Count
FROM
(
SELECT barName,
ROW_NUMBER() OVER(ORDR BY barName PARTITION BY barName) as rn
FROM Sells
) CountSells
GROUP BY barName
Thirdly, you could avoid a self-join in MSSQL, by using OVER():
SELECT
barName
COUNT(*) OVER(ORDER BY barName PARTITION BY barName) AS Count
FROM Sells

SQL select join the row with max (arithmatic(value1, value2))

I am trying to make a Trade system where people can make offer on the items they want. There are two currencies in the system, gold and silver. 100 silver = 1 gold. Note that people can make offers the same price as others, so there could be duplicate highest offer price.
Table structure looks roughly like this
Trade table
ID
TradeOffer table
ID
UserID
TradeID references Trade(ID)
GoldOffer
SilverOffer
I want to display to the user a list of trades sorted by the highest offer price whenever they do a search with constraint.
The Ideal output would be similar to this
Trade.ID TradeOffer.ID HighestGoldOffer HighestSilverOffer UserID
where HighestGoldOffer and HighestSilverOffer are the value of GoldOffer and SilverOffer column of the Offer with highest (GoldOffer * 100 + SilverOffer) and UserID is the user who made the offer
I know I can run 2 separate queries, one to retrieve all the Trades that satisfies all the constraint and extract all the ID to run another query to get the highest offer, but I am a perfectionist so I would prefer to do it with one sql instead of two.
I could just select all offers that are (GoldOffer * 100 + SilverOffer) = MAX (GoldOffer * 100 + SilverOffer) but this would possibly return duplicated Trade if there are multiple people offered the same price. Also there could be nobody offered on the Trade yet so GoldOffer and SilverOffer will be empty, I would still like to show the Trade as no offer when this happened.
Hope I made myself clear and thanks for any help
Model and test data
CREATE TABLE Trade (ID INT)
CREATE TABLE TradeOffer
(
ID INT,
UserID INT,
TradeID INT,
GoldOffer INT,
SilverOffer INT
)
INSERT Trade VALUES (1), (2), (3)
INSERT TradeOffer VALUES
(1, 1, 1, 10, 15),
(2, 2, 1, 11, 15),
(3, 1, 2, 10, 16),
(4, 2, 2, 10, 16)
Query
SELECT
[TradeID],
[TradeOfferID],
[HighestGoldOffer],
[HighestSilverOffer],
[UserID]
FROM (
SELECT
t.ID AS [TradeID],
tOffer.ID AS [TradeOfferID],
tOffer.GoldOffer AS [HighestGoldOffer],
tOffer.SilverOffer AS [HighestSilverOffer],
tOffer.[UserID],
RANK() OVER (
PARTITION BY t.ID
ORDER BY (([GoldOffer] * 100) + [SilverOffer]) DESC
) AS [Rank]
FROM Trade t
LEFT JOIN TradeOffer tOffer
ON tOffer.TradeID = t.ID
) x
WHERE [Rank] = 1
Result

sql select a field into 2 columns

I am trying to run below 2 queries on the same table and hoping to get results in 2 different columns.
Query 1: select ID as M from table where field = 1
returns:
1
2
3
Query 2: select ID as N from table where field = 2
returns:
4
5
6
My goal is to get
Column1 - Column2
-----------------
1 4
2 5
3 6
Any suggestions? I am using SQL Server 2008 R2
Thanks
There has to be a primary key to foreign key relationship to JOIN data between two tables.
That is the idea about relational algebra and normalization. Otherwise, the correlation of the data is meaningless.
http://en.wikipedia.org/wiki/Database_normalization
The CROSS JOIN will give you all possibilities. (1,4), (1,5), (1, 6) ... (3,6). I do not think that is what you want.
You can always use a ROW_NUMBER() OVER () function to generate a surrogate key in both tables. Order the data the way you want inside the OVER () clause. However, this is still not in any Normal form.
In short. Why do this?
Quick test database. Stores products from sporting goods and home goods using non-normal form.
The results of the SELECT do not mean anything.
-- Just play
use tempdb;
go
-- Drop table
if object_id('abnormal_form') > 0
drop table abnormal_form
go
-- Create table
create table abnormal_form
(
Id int,
Category int,
Name varchar(50)
);
-- Load store products
insert into abnormal_form values
(1, 1, 'Bike'),
(2, 1, 'Bat'),
(3, 1, 'Ball'),
(4, 2, 'Pot'),
(5, 2, 'Pan'),
(6, 2, 'Spoon');
-- Sporting Goods
select * from abnormal_form where Category = 1
-- Home Goods
select * from abnormal_form where Category = 2
-- Does not mean anything to me
select Id1, Id2 from
(select ROW_NUMBER () OVER (ORDER BY ID) AS Rid1, Id as Id1
from abnormal_form where Category = 1) as s
join
(select ROW_NUMBER () OVER (ORDER BY ID) AS Rid2, Id as Id2
from abnormal_form where Category = 2) as h
on s.Rid1 = h.Rid2
We definitely need more information from the user.

How to get the complete row from a maximum calculation?

I do struggle with a GROUP BY -- again. The basics I can handle, but there it is: How do I get to different columns I named in the group by, without destroying my grouping? Note that group by is only my own idea, there may be others that work better. It must work in Oracle, though.
Here is my example:
create table xxgroups (
groupid int not null primary key,
groupname varchar2(10)
);
insert into xxgroups values(100, 'Group 100');
insert into xxgroups values(200, 'Group 200');
drop table xxdata;
create table xxdata (
num1 int,
num2 int,
state_a int,
state_b int,
groupid int,
foreign key (groupid) references xxgroups(groupid)
);
-- "ranks" are 90, 40, null, 70:
insert into xxdata values(10, 10, 1, 4, 100);
insert into xxdata values(10, 10, 0, 4, 200);
insert into xxdata values(11, 11, 0, 3, 100);
insert into xxdata values(20, 22, 5, 7, 200);
The task is to create a result-row for each distinct (num1, num2) and print that groupname with the highest calculated "rank" from state_a and state_b.
Note that the first two rows have the same nums and thus only the higher ranking should be selected -- with the groupname being "Group 200".
I got quite far with the basic group by, I think.
SELECT xd.num1||xd.num2 nummer, max(ranking.goodness)
FROM xxdata xd
, xxgroups xg
,( select state_a, state_b, r as goodness
from dual
model return updated rows
dimension by (0 state_a, 0 state_b) measures (0 r)
rules (r[1,4]=90, r[3,7]=80,r[5,7]=70, r[4,7]=60, r[0,7]=50, r[0,4]=40)
order by goodness desc
) ranking
WHERE xd.groupid=xg.groupid
and ranking.state_a (+) = xd.state_a
and ranking.state_b (+) = xd.state_b
GROUP BY xd.num1||xd.num2
ORDER BY nummer
;
The result is 90% of what I need:
nummer ranking
----------------
1010 90
1111
2022 70
100% perfect would be
nummer groupname
-------------------
1010 Group 100
1111 Group 100
2022 Group 200
The tricky part is, that I want the groupname in the result. And I can not include it in the select, because then I would have to put it into the group by as well -- which I do not want (then I would not select the best ranking entry from over all groups)
In my solution a use a model table to calculate the "rank". There are other solution I am sure. The point is, that it is a non-trivial calculation that I do not want to do twice.
I know from other examples that one could use a second query to get back to the original row to get to the groupname, but I can not see how I could to this here,
without duplicating my ranking calculation.
A nice suggestion was to replace the group by with a LIMIT 1/ORDER BY goodness and use this calculating select as a filtering subselect. But a) there is no LIMIT in Oracle, and I doubt a rownum<=1 would do in a subselect and b) I can not wrap my brain around it anyway. Maybe there is a way?
You can use the FIRST aggregation modifier to selectively apply your function over a subset of rows of a group -- here a single row (SQLFiddle demo):
SELECT xd.num1||xd.num2 nummer,
MAX(xg.groupname) KEEP (DENSE_RANK FIRST
ORDER BY ranking.goodness DESC) grp,
max(ranking.goodness)
FROM xxdata xd
, xxgroups xg
,( select state_a, state_b, r as goodness
from dual
model return updated rows
dimension by (0 state_a, 0 state_b) measures (0 r)
rules (r[1,4]=90, r[3,7]=80,r[5,7]=70, r[4,7]=60, r[0,7]=50, r[0,4]=40)
order by goodness desc
) ranking
WHERE xd.groupid=xg.groupid
and ranking.state_a (+) = xd.state_a
and ranking.state_b (+) = xd.state_b
GROUP BY xd.num1||xd.num2
ORDER BY nummer;
Your method with analytics works as well but since we already use aggregations here, we may as well use the FIRST modifier to get all columns in one go.
Whow, I did search before, but now I found this answer, which I could adopt to my question. The Oracle-solution here is over, partition by with order by and row_number():
select *
from ( select data.*, row_number()
over (partition by nummer order by goodness desc) as seqnum
from (
SELECT xd.num1, xd.num2 nummer, xg.groupname, ranking.goodness
FROM xxdata xd
, xxgroups xg
,( select state_a, state_b, r as goodness
from dual
model return updated rows
dimension by (0 state_a, 0 state_b) measures (0 r)
rules (r[1,4]=90, r[3,7]=80,r[5,7]=70, r[4,7]=60, r[0,7]=50, r[0,4]=40)
) ranking
WHERE xd.groupid=xg.groupid
and ranking.state_a (+) = xd.state_a
and ranking.state_b (+) = xd.state_b
ORDER BY nummer
) data )
where seqnum = 1
;
The result is
10 10 Group 100 90 1
11 11 Group 100 1
20 22 Group 200 70 1
which is beautiful.
Now I have to try to understand what over in the select excactly does....