Group by range of values in bigquery - google-bigquery

Is there any way in Bigquery to group by not the absolute value but a range of values?
I have a query that looks in a product table with 4 different numeric group by's.
What I am looking for is an efficient way to group by in a way like:
group by "A±1000" etc. or "A±10%ofA".
thanks in advance,

You can generate a column as a "named range" then group by the column. As an example for your A+-1000 case:
with data as (
select 100 as v union all
select 200 union all
select 2000 union all
select 2100 union all
select 2200 union all
select 4100 union all
select 8000 union all
select 8000
)
select count(v), ARRAY_AGG(v), ranges
FROM data, unnest([0, 2000, 4000, 6000, 8000]) ranges
WHERE data.v >= ranges - 1000 AND data.v < ranges + 1000
GROUP BY ranges
Output:
+-----+------------------------+--------+
| f0_ | f1_ | ranges |
+-----+------------------------+--------+
| 2 | ["100","200"] | 0 |
| 3 | ["2000","2100","2200"] | 2000 |
| 1 | ["4100"] | 4000 |
| 2 | ["8000","8000"] | 8000 |
+-----+------------------------+--------+

Below example is for BigQuery Standard SQL
#standardSQL
WITH `project.dataset.example` AS (
SELECT * FROM
UNNEST([STRUCT<id INT64, price FLOAT64>
(1, 15), (2, 50), (3, 125), (4, 150), (5, 175), (6, 250)
])
)
SELECT
CASE
WHEN price > 0 AND price <= 100 THEN ' 0 - 100'
WHEN price > 100 AND price <= 200 THEN '100 - 200'
ELSE '200+'
END AS range_group,
COUNT(1) AS cnt
FROM `project.dataset.example`
GROUP BY range_group
-- ORDER BY range_group
with result
Row range_group cnt
1 0 - 100 2
2 100 - 200 3
3 200+ 1
As you can see, in above solution you need construct CASE statement to reflect your ranges - if you have multiple - this can be quite boring - so below is more generic (but more verbose) solution - and it uses recently introduced RANGE_BUCKET function
#standardSQL
WITH `project.dataset.example` AS (
SELECT * FROM
UNNEST([STRUCT<id INT64, price FLOAT64>
(1, 15), (2, 50), (3, 125), (4, 150), (5, 175), (6, 250)
])
), ranges AS (
SELECT [100.0, 200.0] ranges_array
), temp AS (
SELECT OFFSET, IF(prev_val = val, CONCAT(prev_val, ' - '), CONCAT(prev_val, ' - ', val)) rng FROM (
SELECT OFFSET, IFNULL(CAST(LAG(val) OVER(ORDER BY OFFSET) AS STRING), '') prev_val, CAST(val AS STRING) AS val
FROM ranges, UNNEST(ARRAY_CONCAT(ranges_array, [ARRAY_REVERSE(ranges_array)[OFFSET(0)]])) val WITH OFFSET
)
)
SELECT
RANGE_BUCKET(price, ranges_array) range_group,
rng,
COUNT(1) AS cnt
FROM `project.dataset.example`, ranges
JOIN temp ON RANGE_BUCKET(price, ranges_array) = OFFSET
GROUP BY range_group, rng
-- ORDER BY range_group
with result
Row range_group rng cnt
1 0 - 100 2
2 1 100 - 200 3
3 2 200 - 1
As you can see, in second solution you need to define your your ranges in ranges as simple array enlisting your boundaries as SELECT [100.0, 200.0] ranges_array
Then temp does all needed calculation

You can do math operations on the GROUP BY, creating groups by any arbitrary criteria.
For example:
WITH data AS (
SELECT repo.name, COUNT(*) price
FROM `githubarchive.month.201909`
GROUP BY 1
HAVING price>100
)
SELECT FORMAT('range %i-%i', MIN(price), MAX(price)) price_range, COUNT(*) c
FROM data
GROUP BY CAST(LOG(price) AS INT64)
ORDER BY MIN(price)

Related

How to select unoccupied ranges between two numbers

Consider I am having below table:
Id | Title | Start | End
-----+--------------+---------+-----
1 | Group A | 100 | 200
-----+--------------+---------+-----
2 | Group B | 350 | 500
-----+--------------+---------+-----
3 | Group C | 600 | 800
I want to get unoccupied ranges between 100 and 999.
my required final result would be:
Id | Start | End
-----+----------+-----
1 | 201 | 349
-----+----------+-----
2 | 501 | 599
-----+----------+-----
3 | 801 | 999
You can use lead() window function to do so.
Select Id, [End]+1 as Start, coalesce((lead(start)over(order by id) -1),999) [End]
from mytable
Since at the last row result of lead() will be null I have used coalesce() to make it 999.
Schema:
create table mytable( Id int, Title varchar(50),[Start] int , [End] int);
insert into mytable values(1, 'Group A', 100, 200);
insert into mytable values(2, 'Group B', 350, 500);
insert into mytable values(3, 'Group C', 600, 800);
Query:
Select Id, [End]+1 as [Start], coalesce((lead([start])over(order by id) -1),999) [End]
from mytable
Output:
Id
Start
End
1
201
349
2
501
599
3
801
999
db<>fiddle here
This is a tricky problem. If I make the following assumptions:
All the values are between 100 and 999.
The values have no overlaps.
Then you can handle this with lead() and union all:
select null, 100, min(starti) - 1
from t
having min(starti) > 100
union all
select title, endi + 1, next_starti - 1
from (select lead(starti, 1, 1000) over (order by starti) as next_starti, t.*
from t
) t
where next_starti >= endi + 1;
Note that the first subquery is for a condition not in your sample data, but where the first value starts after 100.
For the more general solution where you could have overlaps, the simplest method might be to general all possible values, remove the ones that exist, and then combine the adjacent values:
with n as (
select 100 as n
union all
select n + 1
from n
where n < 999
)
select min(n), max(n)
from (select n.*, row_number() over (order by n) as seqnum
from n
where not exists (select 1 from t where n.n between t.starti and t.endi)
) tn
group by (n - seqnum)
order by min(n)
option (maxrecursion 0);
Here is a db<>fiddle.

flatten list of ranges to single result range set

I am trying to "flatten" a list of ranges in a defined order (alphabetically by name in the examples provided) to a single merged result. Newer Ranges overwrite values of older ranges. Conceptually it looks like this, with "e" being the newest range:
0 1 2 3 4 5 6 7
|-------------a-------------|
|---b---|
|---c---|
|---d---|
|---e---|
|-a-|---c---|---e---|-d-|-a-| <-- expected result
To prevent further confusion: The expected result here is indeed correct. The values 0 - 7 are just the ranges' values, not a progression in time. I use integers for simplicity here, but the values might not be discrete but continuous.
Note that b is completely overshadowed and not relevant anymore.
the data may be modeled like this in SQL:
create table ranges (
name varchar(1),
range_start integer,
range_end integer
);
insert into ranges (name, range_start, range_end) values ('a', 0, 7);
insert into ranges (name, range_start, range_end) values ('b', 2, 4);
insert into ranges (name, range_start, range_end) values ('c', 1, 3);
insert into ranges (name, range_start, range_end) values ('d', 4, 6);
insert into ranges (name, range_start, range_end) values ('e', 3, 5);
-- assume alphabetical order by name
It would be perfect if there was a way to directly query the result in SQL, e.g. like this:
select *magic* from ranges;
-- result:
+------+-------------+-----------+
| a | 0 | 1 |
| c | 1 | 3 |
| e | 3 | 5 |
| d | 5 | 6 |
| a | 6 | 7 |
+------+-------------+-----------+
But I suspect that is not realistically feasible, therefore I need to at least filter out all ranges that are overshadowed by newer ones, as is the case for b in the example above. Otherwise the query would need to transfer more and more irrelevant data as the database grows and new ranges overshadow older ones. For the example above, such a query could return all entries except for b, e.g.:
select *magic* from ranges;
-- result:
+------+-------------+-----------+
| a | 0 | 7 |
| c | 1 | 3 |
| d | 4 | 6 |
| e | 3 | 5 |
+------+-------------+-----------+
I was unable to construct such a filter in SQL. The only thing I managed to do is query all data and then calculate the result in code, for example in Java using the Google Guava library:
final RangeMap<Integer, String> rangeMap = TreeRangeMap.create();
rangeMap.put(Range.closedOpen(0, 7), "a");
rangeMap.put(Range.closedOpen(2, 4), "b");
rangeMap.put(Range.closedOpen(1, 3), "c");
rangeMap.put(Range.closedOpen(4, 6), "d");
rangeMap.put(Range.closedOpen(3, 5), "e");
System.out.println(rangeMap);
// result: [[0..1)=a, [1..3)=c, [3..5)=e, [5..6)=d, [6..7)=a]
Or by hand in python:
import re
from collections import namedtuple
from typing import Optional, List
Range = namedtuple("Range", ["name", "start", "end"])
def overlap(lhs: Range, rhs: Range) -> Optional[Range]:
if lhs.end <= rhs.start or rhs.end <= lhs.start:
return None
return Range(None, min(lhs.start, rhs.start), max(lhs.end, rhs.end))
def range_from_str(str_repr: str) -> Range:
name = re.search(r"[a-z]+", str_repr).group(0)
start = str_repr.index("|") // 4
end = str_repr.rindex("|") // 4
return Range(name, start, end)
if __name__ == '__main__':
ranges: List[Range] = [
# 0 1 2 3 4 5 6 7
range_from_str("|-------------a-------------|"),
range_from_str(" |---b---| "),
range_from_str(" |---c---| "),
range_from_str(" |---d---| "),
range_from_str(" |---e---| "),
# result: |-a-|---c---|---e---|-d-|-a-|
]
result: List[Range] = []
for range in ranges:
for i, res in enumerate(result[:]):
o = overlap(range, res)
if o:
result.append(Range(res.name, o.start, range.start))
result.append(Range(res.name, range.end, o.end))
result[i] = Range(res.name, 0, 0)
result.append(range)
result = sorted(filter(lambda r: r.start < r.end, result), key=lambda r: r.start)
print(result)
# result: [Range(name='a', start=0, end=1), Range(name='c', start=1, end=3), Range(name='e', start=3, end=5), Range(name='d', start=5, end=6), Range(name='a', start=6, end=7)]
The following simple query returns all smallest intervals with top name:
with
all_points(x) as (
select range_start from ranges
union
select range_end from ranges
)
,all_ranges(range_start, range_end) as (
select *
from (select
x as range_start,
lead(x) over(order by x) as range_end
from all_points)
where range_end is not null
)
select *
from all_ranges ar
cross apply (
select max(name) as range_name
from ranges r
where r.range_end >= ar.range_end
and r.range_start <= ar.range_start
)
order by 1,2;
Results:
RANGE_START RANGE_END RANGE_NAME
----------- ---------- ----------
0 1 a
1 2 c
2 3 c
3 4 e
4 5 e
5 6 d
6 7 a
So we need to merge connected intervals with the same names:
Final query without new oracle-specific features
with
all_points(x) as (
select range_start from ranges
union
select range_end from ranges
)
,all_ranges(range_start, range_end) as (
select *
from (select
x as range_start,
lead(x) over(order by x) as range_end
from all_points)
where range_end is not null
)
select
grp,range_name,min(range_start) as range_start,max(range_end) as range_end
from (
select
sum(start_grp_flag) over(order by range_start) grp
,range_start,range_end,range_name
from (
select
range_start,range_end,range_name,
case when range_name = lag(range_name)over(order by range_start) then 0 else 1 end start_grp_flag
from all_ranges ar
cross apply (
select max(name) as range_name
from ranges r
where r.range_end >= ar.range_end
and r.range_start <= ar.range_start
)
)
)
group by grp,range_name
order by 1;
Results:
GRP RANGE_NAME RANGE_START RANGE_END
---------- ---------- ----------- ----------
1 a 0 1
2 c 1 3
3 e 3 5
4 d 5 6
5 a 6 7
Or using actual oracle specific features:
with
all_ranges(range_start, range_end) as (
select * from (
select
x as range_start,
lead(x) over(order by x) as range_end
from (
select distinct x
from ranges
unpivot (x for r in (range_start,range_end))
))
where range_end is not null
)
select *
from all_ranges ar
cross apply (
select max(name) as range_name
from ranges r
where r.range_end >= ar.range_end
and r.range_start <= ar.range_start
)
match_recognize(
order by range_start
measures
first(range_start) as r_start,
last(range_end) as r_end,
last(range_name) as r_name
pattern(STRT A*)
define
A as prev(range_name)=range_name and prev(range_end) = range_start
);
Here is a hierarchical query that would give you the desired output:
WITH ranges(NAME, range_start, range_end) AS
(SELECT 'a', 0, 7 FROM dual UNION ALL
SELECT 'b', 2, 4 FROM dual UNION ALL
SELECT 'c', 1, 3 FROM dual UNION ALL
SELECT 'd', 4, 6 FROM dual UNION ALL
SELECT 'e', 3, 5 FROM dual UNION ALL
SELECT 'f', -3, -2 FROM dual UNION ALL
SELECT 'g', 8, 20 FROM dual UNION ALL
SELECT 'h', 12, 14 FROM dual)
, rm (NAME, range_start, range_end) AS
(SELECT r.*
FROM (SELECT r.NAME
, r.range_start
, NVL(r2.range_start, r.range_end) range_end
FROM ranges r
OUTER apply (SELECT *
FROM ranges
WHERE range_start BETWEEN r.range_start AND r.range_end
AND NAME > r.NAME
ORDER BY range_start, NAME DESC
FETCH FIRST 1 ROWS ONLY) r2
ORDER BY r.range_start, r.NAME desc
FETCH FIRST 1 ROWS ONLY) r
UNION ALL
SELECT r2.NAME
, r2.range_start
, r2.range_end
FROM rm
CROSS apply (SELECT r.NAME
, GREATEST(rm.range_end, r.range_start) range_start
, NVL(r2.range_start, r.range_end) range_end
FROM ranges r
OUTER apply (SELECT *
FROM ranges
WHERE range_start BETWEEN GREATEST(rm.range_end, r.range_start) AND r.range_end
AND NAME > r.NAME
ORDER BY range_start, NAME DESC
FETCH FIRST 1 ROWS ONLY) r2
WHERE r.range_end > rm.range_end
AND NOT EXISTS (SELECT 1 FROM ranges r3
WHERE r3.range_end > rm.range_end
AND (GREATEST(rm.range_end, r3.range_start) < GREATEST(rm.range_end, r.range_start)
OR (GREATEST(rm.range_end, r3.range_start) = GREATEST(rm.range_end, r.range_start)
AND r3.NAME > r.NAME)))
FETCH FIRST 1 ROWS ONLY) r2)
CYCLE NAME, range_start, range_end SET cycle TO 1 DEFAULT 0
SELECT * FROM rm
First you get the first entry ordered by range_start desc, name which will give you the most resent entry with the lowest name.
Then you search for a range with higher name that intersect with this range. If there is one the range_start of this interval will be the range_end of you final interval.
With this start you search more or less the next entry with the same condition.
There is also another less effective but easier and shorter approach: to generate all points and just aggregate them.
For example this simple query will generate all intermediate points:
select x,max(name)
from ranges,
xmltable('xs:integer($A) to xs:integer($B)'
passing range_start as a
,range_end as b
columns x int path '.'
)
group by x
Results:
X M
---------- -
0 a
1 c
2 c
3 e
4 e
5 e
6 d
7 a
Then we can merge them:
select *
from (
select x,max(name) name
from ranges,
xmltable('xs:integer($A) to xs:integer($B)-1'
passing range_start as a
,range_end as b
columns x int path '.'
)
group by x
order by 1
)
match_recognize(
order by x
measures
first(x) as r_start,
last(x)+1 as r_end,
last(name) as r_name
pattern(STRT A*)
define
A as prev(name)=name and prev(x)+1 = x
);
Results:
R_START R_END R
---------- ---------- -
0 1 a
1 3 c
3 5 e
5 6 d
6 7 a
I don't understand your results -- as I've explained in the comments. The "b" should be present, because it is most recent at time 2.
That said, the idea is to unpivot the times and figure out the most recent name at each time -- both beginnings and ends. Then, combine these using gaps-and-islands ideas. This is what the query looks like:
with r as (
select name, range_start as t
from ranges
union all
select null, range_end as t
from ranges
),
r2 as (
select r.*,
(select r2.name
from ranges r2
where r2.range_start <= r.t and
r2.range_end >= r.t
order by r2.range_start desc
fetch first 1 row only
) as imputed_name
from (select distinct t
from r
) r
)
select imputed_name, t,
lead(t) over (order by t)
from (select r2.*,
lag(imputed_name) over ( order by t) as prev_imputed_name
from r2
) r2
where prev_imputed_name is null or prev_imputed_name <> imputed_name;
Here is a db<>fiddle.
Basically the same code should run in Postgres as well.

SQL query to search first rows until sum = value and skip big value that can exceed the value

I have a table
id | amount
---+--------
1 | 500
2 | 300
3 | 750
4 | 200
5 | 500
I want to select rows ascending until the sum is 1000 or until all rows are searched (and skip a big value (750) that can exceed 1000).
How can I do query to return some rows like below?
Thanks for help
id | amount
---+--------
1 | 500
2 | 300
4 | 200
I think that you need a common table expression for this.
The idea is to do a cumulative sum that skips the rows that would cause the sum to go above 1000 (aliased sm in the CTE), and to flag the records to skip (aliased keep in the CTE). Then the outer query just filters on the flag.
with recursive cte as (
select
id,
amount,
case when amount > 1000 then 0 else amount end sm,
case when amount > 1000 then 0 else 1 end keep
from mytable
where id = 1
union all
select
t.id,
t.amount,
case when c.sm + t.amount > 1000 then c.sm else c.sm + t.amount end,
case when c.sm + t.amount > 1000 then 0 else 1 end
from cte c
inner join mytable t on t.id = c.id + 1
)
select id, amount from cte where keep = 1 order by id
Demo on DB Fiddle:
id | amount
-: | -----:
1 | 500
2 | 300
4 | 200
you should get the expected result using a recursively common table expression..
doing something like this..
with RECURSIVE yourtableOrdered as (select row_number() over (order by id) row_num, id, val from (values (1, 500), (2, 300), (3, 750), (4, 200), (5, 500)) V (id, val)),
lineSum as (
select row_num, id, val,
case when val <= 1000 then val else 0 end totalSum,
case when val <= 1000 then true else false end InResult
from yourtableOrdered
where row_num = 1
union all
select y.row_num, y.id, y.val,
case when previousLine.totalSum + y.val <= 1000 then previousLine.totalSum + y.val else previousLine.totalSum end totalSum,
case when previousLine.totalSum + y.val <= 1000 then true else false end InResult
from yourtableOrdered y
inner join lineSum previousLine
on y.row_num = previousLine.row_num + 1
),
yourExpectedResult as (
select * from lineSum where InResult = true
)
select * from yourExpectedResult
see a working sample in
http://sqlfiddle.com/#!17/2cbcf/1/0
Use a cumulative sum:
select t.*
from (select t.*,
sum(amount) over (order by id) as running_amount
from t
) t
where running_amount - amount < 1000;

How do I join 2 tables to allocate items?

I've created 2 tables that have inventory information (item, location, qty). One of them NeedInv has item/location(s) that need X number of items. The other HaveInv has item/locations(s) with excess X number of items.
I'm trying to join or combine the 2 tables to output which items should be transferred between which locations. I have code that does this for a single distribution location & I've attempted to modify it and add logic to have it work with multiple distribution locations, but it still fails in certain situations.
I've created a [sqlfiddle]1, but the sample data is like so:
CREATE TABLE NeedInv
(item int, location varchar(1), need int)
INSERT INTO NeedInv
(item, location, need)
VALUES
(100, 'A', 4), (100, 'B', 0), (100, 'C', 2), (200, 'A', 0), (200, 'B', 1), (200, 'C', 1), (300, 'A', 3), (300, 'B', 5), (300, 'C', 0)
CREATE TABLE HaveInv
(item int, location varchar(1), have int)
INSERT INTO HaveInv
(item, location, have)
VALUES
(100, 'A', 0), (100, 'B', 3), (100, 'C', 0), (100, 'D', 3), (200, 'A', 1), (200, 'B', 0), (200, 'C', 0), (200, 'D', 1), (300, 'A', 0), (300, 'B', 0), (300, 'C', 20), (300, 'D', 5)
CREATE TABLE DesiredOutput
(item int, SourceLocation varchar(1), TargetLocation varchar(1), Qty int)
INSERT INTO DesiredOutput
(item, SourceLocation, TargetLocation, Qty)
VALUES
(100, 'B', 'A', 3), (100, 'D', 'A', 1), (100, 'D', 'C', 2), (200, 'A', 'B', 2), (200, 'A', 'C', 3), (200, 'D', 'C', 1), (300, 'C', 'A', 3), (300, 'C', 'B', 3)
I was trying to output something like this as a result of joining the tables:
+------+----------------+----------------+-----+
| item | SourceLocation | TargetLocation | Qty |
+------+----------------+----------------+-----+
| 100 | B | A | 3 |
| 100 | D | A | 1 |
| 100 | D | C | 2 |
| 200 | A | B | 2 |
| 200 | A | C | 3 |
| 200 | D | C | 1 |
| 300 | C | A | 3 |
| 300 | C | B | 3 |
+------+----------------+----------------+-----+
My current query to join the 2 tables looks like so:
select
n.*,
(case when Ord <= Remainder and (RemaingNeed > 0 and RemaingNeed < RemainingInv) then Allocated + RemaingNeed else case when RemaingNeed < 0 then 0 else Allocated end end) as NeedToFill
from (
select
n.*,
row_number() over(partition by item order by RN, (case when need > Allocated then 0 else 1 end)) as Ord,
n.TotalAvail - sum(n.Allocated) over (partition by item) as Remainder
from (
select
n.*,
n.TotalAvail - sum(n.Allocated) over (partition by item order by RN) as RemainingInv,
n.need - sum(n.Allocated) over (partition by item, location order by RN) as RemaingNeed
from (
select
n.*,
case when Proportional > need then need else Proportional end as Allocated
from (
select
row_number() over(order by need desc) as RN,
n.*,
h.location as Source,
h.have,
h.TotalAvail,
convert(int, floor(h.have * n.need * 1.0 / n.TotalNeed), 0) as Proportional
from (
select n.*, sum(need) over (partition by item) as TotalNeed
from NeedInv n) n
join (select h.*, sum(have) over (partition by item) as TotalAvail from HaveInv h) h
on n.item = h.item
and h.have > 0
) n
) n
) n
) n
where n.need > 0
It seems to work for most cases except when Allocated is set to zero, but there's still items that could be transferred. This can be seen for item 200 1 where location B only needs 1 but is going to receive 2 items, while location C which also needs 1 item will receive 0.
Any help/guidance would be appreciated!
Your query looks a little complicated for what it needs to do, IMO.
As far as I can tell, this is just a simple matter of building the logic into a query using running totals of inventory. Essentially, it's just a matter of building in rules such that if what you need can be taken from a source location, you take it, otherwise you take as much as possible.
For example, I believe the following query contains the logic required:
SELECT N.Item,
SourceLocation = H.Location,
TargetLocation = N.Location,
Qty =
CASE
WHEN N.TotalRunningRequirement <= H.TotalRunningInventory -- If the current source location has enough stock to fill the request.
THEN
CASE
WHEN N.TotalRunningRequirement - N.Need < H.TotalRunningInventory - H.Have -- If stock required has already been allocated from elsewhere.
THEN N.TotalRunningRequirement - (H.TotalRunningInventory - H.Have) -- Get the total running requirement minus stock allocated from elsewhere.
ELSE N.Need -- Otherwise just take how much is needed.
END
ELSE N.Need - (N.TotalRunningRequirement - H.TotalRunningInventory) -- Current source doesn't have enough stock to fulfil need, so take as much as possible.
END
FROM
(
SELECT *, TotalRunningRequirement = SUM(need) OVER (PARTITION BY item ORDER BY location)
FROM NeedInv
WHERE need > 0
) AS N
JOIN
(
SELECT *, TotalRunningInventory = SUM(have) OVER (PARTITION BY item ORDER BY location)
FROM HaveInv
WHERE have > 0
) AS H
ON H.Item = N.Item
AND H.TotalRunningInventory - (N.TotalRunningRequirement - N.need) > 0 -- Join if stock in source location can be taken
AND H.TotalRunningInventory - H.Have - (N.TotalRunningRequirement - N.need) < N.TotalRunningRequirement
;
Note: Your desired output doesn't seem to match your sample data for Item 200 as far as I can tell.
I was wondering if a Recursive CTE could be used for allocations.
But it turned out a bit more complicated.
The result doesn't completely align with those expected result in the question.
But since the other answer returns the same results, I guess that's fine.
So see it as just an extra method.
Test on db<>fiddle here
It basically loops through the haves and needs in the order of the calculated row_numbers.
And assigns what's still available for what's still needed.
declare #HaveNeedInv table (
item int,
rn int,
loc varchar(1),
have int,
need int,
primary key (item, rn, loc, have, need)
);
insert into #HaveNeedInv (item, loc, have, need, rn)
select item, location, sum(have), 0 as need,
row_number() over (partition by item order by sum(have) desc)
from HaveInv
where have > 0
group by item, location;
insert into #HaveNeedInv (item, loc, have, need, rn)
select item, location, 0 as have, sum(need),
row_number() over (partition by item order by sum(need) desc)
from NeedInv
where need > 0
group by item, location;
;with ASSIGN as
(
select h.item, 0 as lvl,
h.rn as hrn, n.rn as nrn,
h.loc as hloc, n.loc as nloc,
h.have, n.need,
iif(h.have<=n.need,h.have,n.need) as assign
from #HaveNeedInv h
join #HaveNeedInv n on (n.item = h.item and n.need > 0 and n.rn = 1)
where h.have > 0 and h.rn = 1
union all
select t.item, a.lvl + 1,
iif(t.have>0,t.rn,a.hrn),
iif(t.need>0,t.rn,a.nrn),
iif(t.have>0,t.loc,a.hloc),
iif(t.need>0,t.loc,a.nloc),
iif(a.have>a.assign,a.have-a.assign,t.have),
iif(a.need>a.assign,a.need-a.assign,t.need),
case
when t.have > 0
then case
when t.have > (a.need - a.assign) then a.need - a.assign
else t.have
end
else case
when t.need > (a.have - a.assign) then a.have - a.assign
else t.need
end
end
from ASSIGN a
join #HaveNeedInv t
on t.item = a.item
and iif(a.have>a.assign,t.need,t.have) > 0
and t.rn = iif(a.have>a.assign,a.nrn,a.hrn) + 1
)
select
item,
hloc as SourceLocation,
nloc as TargetLocation,
assign as Qty
from ASSIGN
where assign > 0
order by item, hloc, nloc
option (maxrecursion 1000);
Result:
100 B A 3
100 D A 1
100 D C 2
200 A B 1
200 D C 1
300 C A 3
300 C B 5
Changing the order in the row_numbers (to fill #NeedHaveInv) will change the priority, and could return a different result.

Generating order statistics grouped by order total

Hopefully I can explain this correctly. I have a table of line orders (each line order consists of quantity of item and the price, there are other fields but I left those out.)
table 'orderitems':
orderid | quantity | price
1 | 1 | 1.5000
1 | 2 | 3.22
2 | 1 | 9.99
3 | 4 | 0.44
3 | 2 | 15.99
So to get order total I would run
SELECT SUM(Quantity * price) AS total
FROM OrderItems
GROUP BY OrderID
However, I would like to get a count of all total orders under $1 (just provide a count).
My end result I would like would be able to define ranges:
under $1, $1 - $3, 3-5, 5-10, 10-15, 15.. etc;
and my data to look like so (hopefully):
tunder1 | t1to3 | t3to5 | t5to10 | etc
10 | 500 | 123 | 5633 |
So that I can present a piechart breakdown of customer orders on our eCommerce site.
Now I can run individual SQL queries to get this, but I would like to know what the most efficient 'single sql query' would be. I am using MS SQL Server.
Currently I can run a single query like so to get under $1 total:
SELECT COUNT(total) AS tunder1
FROM (SELECT SUM(Quantity * price) AS total
FROM OrderItems
GROUP BY OrderID) AS a
WHERE (total < 1)
How can I optimize this? Thanks in advance!
select
count(case when total < 1 then 1 end) tunder1,
count(case when total >= 1 and total < 3 then 1 end) t1to3,
count(case when total >= 3 and total < 5 then 1 end) t3to5,
...
from
(
select sum(quantity * price) as total
from orderitems group by orderid
);
you need to use HAVING for filtering grouped values.
try this:
DECLARE #YourTable table (OrderID int, Quantity int, Price decimal)
INSERT INTO #YourTable VALUES (1,1,1.5000)
INSERT INTO #YourTable VALUES (1,2,3.22)
INSERT INTO #YourTable VALUES (2,1,9.99)
INSERT INTO #YourTable VALUES (3,4,0.44)
INSERT INTO #YourTable VALUES (3,2,15.99)
SELECT
SUM(CASE WHEN TotalCost<1 THEN 1 ELSE 0 END) AS tunder1
,SUM(CASE WHEN TotalCost>=1 AND TotalCost<3 THEN 1 ELSE 0 END) AS t1to3
,SUM(CASE WHEN TotalCost>=3 AND TotalCost<5 THEN 1 ELSE 0 END) AS t3to5
,SUM(CASE WHEN TotalCost>=5 THEN 1 ELSE 0 END) AS t5andup
FROM (SELECT
SUM(quantity * price) AS TotalCost
FROM #YourTable
GROUP BY OrderID
) dt
OUTPUT:
tunder1 t1to3 t3to5 t5andup
----------- ----------- ----------- -----------
0 0 0 3
(1 row(s) affected)
WITH orders (orderid, quantity, price) AS
(
SELECT 1, 1, 1.5
UNION ALL
SELECT 1, 2, 3.22
UNION ALL
SELECT 2, 1, 9.99
UNION ALL
SELECT 3, 4, 0.44
UNION ALL
SELECT 4, 2, 15.99
),
ranges (bound) AS
(
SELECT 1
UNION ALL
SELECT 3
UNION ALL
SELECT 5
UNION ALL
SELECT 10
UNION ALL
SELECT 15
),
rr AS
(
SELECT bound, ROW_NUMBER() OVER (ORDER BY bound) AS rn
FROM ranges
),
r AS
(
SELECT COALESCE(rf.rn, 0) AS rn, COALESCE(rf.bound, 0) AS f,
rt.bound AS t
FROM rr rf
FULL JOIN
rr rt
ON rt.rn = rf.rn + 1
)
SELECT rn, f, t, COUNT(*) AS cnt
FROM r
JOIN (
SELECT SUM(quantity * price) AS total
FROM orders
GROUP BY
orderid
) o
ON total >= f
AND total < COALESCE(t, 10000000)
GROUP BY
rn, t, f
Output:
rn f t cnt
1 1 3 1
3 5 10 2
5 15 NULL 1
, that is 1 order from $1 to $3, 2 orders from $5 to $10, 1 order more than $15.