Anyway to make case statements on rankings smarter? - sql

I want to group my ranks into chunks of data. I thought about using CASE statements, but that not only looks silly, but it's also slow
Any tips on how this can be improved?
Please note chunks vary in size (first listing the top 100, then chunks of 100, then chunks of 1000, then one chunk of 5000 and 3 other chunks of 15K)
select
transaction_code
,row_number() over (order by SALES_AMOUNT desc) as rank
,SALES_AMOUNT
,CASE
WHEN rank <=100 THEN to_varchar(rank)
WHEN rank <=200 then '101-200'
WHEN rank <=300 then '201-300'
WHEN rank <=400 then '301-400'
WHEN rank <=500 then '401-500'
WHEN rank <=1000 then '501-1000'
WHEN rank <=1500 then '1001-1500'
WHEN rank <=2000 then '1501-2000'
WHEN rank <=2500 then '2001-2500'
WHEN rank <=3000 then '2501-3000'
WHEN rank <=3500 then '3001-3500'
WHEN rank <=4000 then '3501-4000'
WHEN rank <=4500 then '4001-4500'
WHEN rank <=5000 then '4501-5000'
WHEN rank <=5500 then '5001-5500'
WHEN rank <=6000 then '5501-6000'
WHEN rank <=6500 then '6001-6500'
WHEN rank <=7000 then '6501-7000'
WHEN rank <=7500 then '7001-7500'
WHEN rank <=8000 then '7501-8000'
WHEN rank <=8500 then '8001-8500'
WHEN rank <=9000 then '8501-9000'
WHEN rank <=95000 then '9001-9500'
WHEN rank <=10000 then '9501-10000'
WHEN rank <=15000 then '10001-15000'
WHEN rank <=30000 then '15001-30000'
WHEN rank <=45000 then '30001-45000'
WHEN rank <=60000 then '45001-60000'
ELSE 'Bottom'
END AS "TRANSACTION GROUPS"

The fastest way is to create a lookup table that maps rank into a group name. You could do it using a stateful JavaScript UDF (initializing the map just once).
But you can also just do it in SQL
Table definition
Simple mapping from a number to a string
create or replace table rank2group(rank integer, grp string);
UDF to generate group name
Your code is indeed very long.
Instead, we can create a function that for a given rank, group_size, and group_base (number from which groups of group_size form), generates a string.
Note, this function will be slower than your code, as it generates a string from input, but we'll only use it to fill the lookup table, so it doesn't matter.
create or replace function group_name(rank integer, group_base integer, group_size integer)
returns varchar
as $$
(group_base + 1 + group_size * floor((rank - 1 - group_base) / group_size))
|| '-' ||
(group_base + group_size + group_size * floor((rank - 1 - group_base) / group_size))
$$;
Example outputs:
select group_name(101, 100, 100), group_name(1678, 500, 500), group_name(15000, 10000, 5000);
---------------------------+----------------------------+--------------------------------+
GROUP_NAME(101, 100, 100) | GROUP_NAME(1678, 500, 500) | GROUP_NAME(15000, 10000, 5000) |
---------------------------+----------------------------+--------------------------------+
101-200 | 1501-2000 | 10001-15000 |
---------------------------+----------------------------+--------------------------------+
Table data generation
We'll generate values that map range 1 .. 60000 only, using Snowflake generators, group_name, and your simplified CASE statement:
create or replace table rank2group(rank integer, grp string);
insert into rank2group
select rank,
CASE
WHEN rank <=100 THEN to_varchar(rank)
-- groups of size 100, starting at 100
WHEN rank <=500 then group_name(rank, 100, 100)
WHEN rank <=10000 then group_name(rank, 500, 500)
-- groups of size 5000, starting at 10000
WHEN rank <=15000 then group_name(rank, 10000, 5000)
WHEN rank <=60000 then group_name(rank, 15000, 15000)
ELSE 'Bottom'
END AS "TRANSACTION GROUPS"
from (
select row_number() over (order by 1) as rank
from table(generator(rowCount=>60000))
);
Usage
To use it, we simply join on rank.
Note, you need an outer join followed by ifnull for the Bottom values.
Example, using a generated input that creates exponentially increasing numbers:
with input as (
select 1 + (seq8() * seq8() * seq8()) AS rank
from table(generator(rowCount=>50))
)
select input.rank, ifnull(grp, 'Bottom') grp
from input left outer join rank2group on input.rank = rank2group.rank
order by input.rank;
--------+-------------+
RANK | GRP |
--------+-------------+
1 | 1 |
2 | 2 |
9 | 9 |
28 | 28 |
65 | 65 |
126 | 101-200 |
217 | 201-300 |
344 | 301-400 |
513 | 501-1000 |
730 | 501-1000 |
1001 | 1001-1500 |
1332 | 1001-1500 |
1729 | 1501-2000 |
2198 | 2001-2500 |
2745 | 2501-3000 |
3376 | 3001-3500 |
4097 | 4001-4500 |
4914 | 4501-5000 |
5833 | 5501-6000 |
6860 | 6501-7000 |
8001 | 8001-8500 |
9262 | 9001-9500 |
10649 | 10001-15000 |
12168 | 10001-15000 |
13825 | 10001-15000 |
15626 | 15001-30000 |
17577 | 15001-30000 |
19684 | 15001-30000 |
21953 | 15001-30000 |
24390 | 15001-30000 |
27001 | 15001-30000 |
29792 | 15001-30000 |
32769 | 30001-45000 |
35938 | 30001-45000 |
39305 | 30001-45000 |
42876 | 30001-45000 |
46657 | 45001-60000 |
50654 | 45001-60000 |
54873 | 45001-60000 |
59320 | 45001-60000 |
64001 | Bottom |
68922 | Bottom |
74089 | Bottom |
79508 | Bottom |
85185 | Bottom |
91126 | Bottom |
97337 | Bottom |
103824 | Bottom |
110593 | Bottom |
117650 | Bottom |
--------+-------------+
Possible optimization
If your ranges are always in multiple or 100, you can make a table 100 times smaller, storing only the values ending with 00, and then join on e.g. CEIL(rank)+1.
But then you also need to handle values 1..100 after the join, e.g. IFNULL(grp, IFF(rank <= 100, rank::varchar, 'Bottom'))

Related

Add a subtotal column for aggregated columns

Here's my dataset of trades, traders and counterparties:
TRADER_ID | TRADER_NAME | EXEC_BROKER | TRADE_AMOUNT | TRADE_ID
ABC123 | Jules Winnfield | GOLD | 10000 | ASDADAD
XDA241 | Jimmie Dimmick | GOLD | 12000 | ASSVASD
ADC123 | Vincent Vega | BARC | 10000 | ZXCZCX
ABC123 | Jules Winnfield | BARC | 15000 | ASSXCQA
ADC123 | Vincent Vega | CRED | 250000 | RFAQQA
ABC123 | Jules Winnfield | CRED | 5000 | ASDQ23A
ABC123 | Jules Winnfield | GOLD | 5000 | AVBDQ3A
I'm looking to produce a repeatable monthly report that gives me a view of trading activity aggregated at the counterparty (the EXEC_BROKER field) level, with subtotals - as shown below:
TRADER_ID | TRADER_NAME | NO._OF_CCP_USED | CCP | TRADED_AMT_WITH_CCP | VALUE_OF_TOTAL_TRADES | TRADES_WITH_CCP | TOTAL_TRADES
ABC123 | Jules Winnfield | 3 | GOLD | 15000 | 35000 | 2 | 4
ABC123 | Jules Winnfield | 3 | BARC | 15000 | 35000 | 1 | 4
ABC123 | Jules Winnfield | 3 | CRED | 5000 | 35000 | 1 | 4
...and so on the rest.
The idea is to aggregate the number of trades per counterparty (which I have done using a count function), and the sum of traded amounts with the ccp, but I'm struggling to get the 'subtotal' field next to each trader as shown in my desired output above - so you can see here that Jules has dealt with 3 counterparties in total, with 4 trades between them, and a collective amount of 35000.
I have tried using a combination of aggregate and over by functions, but to no avail.
SELECT
OT.TRADER_ID,
OT.TRADER_NAME,
OT.EXEC_BROKER,
SUM(OT.TRADE_AMOUNT) AS VALUE_OF_TOTAL_TRADES,
COUNT(OT.TRADE_ID) AS TOTAL_TRADES,
COUNT(OT.EXEC_BROKER) OVER PARTITION BY (OT.TRADER_ID) AS NO._OF_CCP_USED,
SUM(OT.TRADE_AMOUNT) OVER PARTITION BY (OT.EXEC_BROKER) AS TRADED_AMT_WITH_CCP,
COUNT(OT.TRADE_ID) OVER PARTITION BY (OT.EXEC_BROKER) AS TRADES_WITH_CCP
FROM dbo.ORDERS_TRADES OT
GROUP BY OT.TRADER_ID, OT.TRADER_NAME, OT.EXEC_BROKER, OT.TRADE_AMOUNT, OT.TRADE_ID
The code above runs but returns millions of rows. When I remove the partition by lines, I get the desired result minus the subtotal columns I'm looking for.
Any suggestions please? Thanks very much!
EDIT:
Final code which gave me the desired output: updating my question to provide this response (thanks to Gordon Linoff) so that others can benefit:
SELECT
OT.TRADER_ID,
OT.TRADER_NAME,
OT.EXEC_BROKER,
RANK() OVER (PARTITION BY OT.TRADER_ID ORDER BY
SUM(OT.TRADE_AMOUNT) DESC) AS CCP_RANK,
SUM(OT.TRADE_AMOUNT) AS TRADED_AMT_WITH_CCP,
SUM(SUM(OT.TRADE_AMOUNT)) OVER (PARTITION BY OT.TRADER_ID) AS
VALUE_OF_TOTAL_TRADES,
COUNT(*) OVER (PARTITION BY OT.TRADER_ID) AS NUM_OF_CCP_USED,
SUM(COUNT(OT.TRADE_ID)) OVER (PARTITION BY OT.TRADER_ID) AS
TOTAL_TRADES
FROM dbo.ORDERS_TRADES OT
GROUP BY OT.TRADER_ID, OT.TRADER_NAME, OT.EXEC_BROKER
You seem to want:
SELECT OT.TRADER_ID, OT.TRADER_NAME, OT.CCP,
COUNT(*) OVER (PARTITION BY OT.TRADER_ID) as NUM_CCP,
SUM(OT.TRADED_AMT) AS TRADED_AMT_WITH_CCP,
SUM(SUM(OT.TRADED_AMT)) OVER (PARTITION BY OT.TRADER_ID) AS VALUE_OF_TOTAL_TRADES,
COUNT(OT.TRADE_ID) AS CCP_TRADES,
SUM(COUNT(OT.TRADE_ID)) OVER (PARTITION BY OT.TRADER_ID) AS TOTAL_TRADES
FROM ORDERS_TRADES OT
GROUP BY OT.TRADER_ID, OT.TRADER_NAME, OT.CCP;
I'm not sure what your query has to do with the results you want. The columns have little to do with what you are asking.
Here is a db<>fiddle.
Making some assumptions about the nomenclature, here is a solution that doesn't use anything too fancy so it's easy to maintain, though it's not the most efficient:
create table trades
(
TRADER_ID varchar(10),
TRADER_NAME varchar(20),
CCP char(4),
TRADED_AMT decimal(10,2),
TRADE_ID varchar(10) primary key
);
insert trades
values
('ABC123', 'Jules Winnfield', 'GOLD', 10000 , 'ASDADAD'),
('XDA241', 'Jimmie Dimmick ', 'GOLD', 12000 , 'ASSVASD'),
('ADC123', 'Vincent Vega ', 'BARC', 10000 , 'ZXCZCX'),
('ABC123', 'Jules Winnfield', 'BARC', 15000 , 'ASSXCQA'),
('ADC123', 'Vincent Vega ', 'CRED', 250000, 'RFAQQA'),
('ABC123', 'Jules Winnfield', 'CRED', 5000 , 'ASDQ23A'),
('ABC123', 'Jules Winnfield', 'GOLD', 5000 , 'AVBDQ3A');
with trader_totals as
(
select trader_id,
distinct_ccps = count(distinct CCP),
total_amt = sum(traded_amt),
total_count = count(*)
from trades
group by trader_id
)
select trader_id = tr.trader_id,
trader_name = trader_name,
distinct_CCP_count = tt.distinct_ccps,
CCP = tr.CCP,
this_CCP_traded_amt = sum(traded_amt),
total_traded_amt = tt.total_amt,
this_CCP_traded_count = count(*),
total_traded_count = tt.total_count
from trades tr
join trader_totals tt on tt.trader_id = tr.trader_id
group by tr.trader_id,
tr.trader_name,
tr.CCP,
tt.distinct_ccps,
tt.total_amt,
tt.total_count

Find the aggregate values of each column and transform that to a row

create table test(
oldnum varchar(32),
newnum varchar(32),
id varchar(32),
transactioncnt int);
Insert into test(oldnum,newnum,id,transactioncnt) values
(220,220,839,22),
(220,220,4,12),
(221,221,1234,10),
(221,222,475,10),
(221,222,687,15),
(225,221,837,60);
output columns:
oldnum,oldnumTotalTransactionCnt,newnum,newtransactionCntcontributedfromOldnum,newnumTotalTransactionCnt
expected output
220,34,220,34,34
221,35,221,10,70
221,35,222,25,25
225,60,221,60,70
if oldnum = newnum for all the records(rows) then just get the sum(transationcnt) and group by oldnum. same values can be filled for different columns
If oldnum <> newnum , sum(transactioncnt) for oldnum is oldnumTotalTransactionCnt, and see if the newnum exits somewhere else in newnum column and take the sum for the newnumTotalTransactionCnt, newtransactionCntcontributedfromOldnum will be what ever the transactioncnt coming from oldnum
Window functions come handy for this, since they give you the ability to compute sumss over different partitions at once:
select distinct
oldnum,
sum(transactioncnt) over(partition by oldnum) oldnumTotalTransactionCnt,
newnum,
sum(transactioncnt) over(partition by oldnum, newnum) newtransactionCntcontributedfromOldnum,
sum(transactioncnt) over(partition by newnum) newnumTotalTransactionCnt
from test
Demo on DB Fiddle:
oldnum | oldnumTotalTransactionCnt | newnum | newtransactionCntcontributedfromOldnum | newnumTotalTransactionCnt
-----: | ------------------------: | -----: | -------------------------------------: | ------------------------:
220 | 34 | 220 | 34 | 34
221 | 35 | 221 | 10 | 70
225 | 60 | 221 | 60 | 70
221 | 35 | 222 | 25 | 25
I prefer to think of this as an aggregation query, with addition columns added using aggregation.
The basic query is:
select oldnum,
newnum,
sum(transactioncnt) as newtransactionCntcontributedfromOldnum,
from test
group by oldnum, newnum
order by 1, 3;
Then you want to sum up the sum(transactioncnt) for both oldnum and newnum. That is where the window functions come in:
select oldnum,
sum(sum(transactioncnt)) over (partition by oldnum) as oldnumTotalTransactionCnt,
newnum,
sum(transactioncnt) as newtransactionCntcontributedfromOldnum,
sum(sum(transactioncnt)) over (partition by newnum) as newumTotalTransactionCnt
from test
group by oldnum, newnum
order by 1, 3;

How to include current row in PARTITION BY of Postgresql's window function

I'm trying to do the following; let's say I want to partition a table in two partition given a set condition:
SELECT
userid,
ARRAY_AGG(userid) OVER (
PARTITION BY userid > 100
) arr,
AVG(userid) OVER (
PARTITION BY userid > 100
) avg
FROM users;
I'll get this:
userid | arr | avg
--------+-----------------------------------------------------------+----------------------
46 | {46,23,69,92} | 57.5000000000000000
23 | {46,23,69,92} | 57.5000000000000000
69 | {46,23,69,92} | 57.5000000000000000
92 | {46,23,69,92} | 57.5000000000000000
552 | {552,506,575,621,644,667,690,759,713,782,828,460,483,529} | 629.2142857142857143
... | ... | ...
529 | {552,506,575,621,644,667,690,759,713,782,828,460,483,529} | 629.2142857142857143
All good, but what if instead, for the userids < 100, I wanted to include each userid with the ones > 100:
SELECT
userid,
CASE WHEN userid > 100
THEN ARRAY_AGG(userid) OVER (
PARTITION BY userid > 100
)
ELSE ARRAY_AGG(userid) OVER (
PARTITION BY userid -- OR userid > 100
-- PARTITION BY userid > 100 OR CURRENT_ROW
-- PARTITION BY userid > 100 OR userid = LAG(userid, 0) OVER ()
)
END arr
CASE WHEN userid > 100
THEN AVG(userid) OVER (
PARTITION BY userid > 100
)
ELSE AVG(userid) OVER (
PARTITION BY userid -- OR userid > 100
-- PARTITION BY userid > 100 OR CURRENT_ROW
-- PARTITION BY userid > 100 OR userid = LAG(userid, 0) OVER ()
)
END avg
FROM users;
All the commented code above is the various tries I've been doing.
The best I've got is either just the userid without the ones > 100 or all userids:
userid | arr | avg
--------+-----------------------------------------------------------+----------------------
23 | {23} | 23.0000000000000000
46 | {46} | 46.0000000000000000
69 | {69} | 69.0000000000000000
92 | {92} | 92.0000000000000000
552 | {552,506,575,621,644,667,690,759,713,782,828,460,483,529} | 629.2142857142857143
... | ... | ...
529 | {552,506,575,621,644,667,690,759,713,782,828,460,483,529} | 629.2142857142857143
Is there any way to do what I'm looking for? I'm also trying not to use CTE as much as possible, because the actual code as so much technical debt that it will takes a pretty lengthy time to just adapt it with a WITH.
To be clear, here is the expected result:
userid | arr | avg
--------+--------------------------------------------------------------|----------------------
23 | {23,552,506,575,621,644,667,690,759,713,782,828,460,483,529} | 588.6000000000000000
46 | {46,552,506,575,621,644,667,690,759,713,782,828,460,483,529} | 590.1333333333333334
69 | {69,552,506,575,621,644,667,690,759,713,782,828,460,483,529} | 591.6666666666666667
92 | {92,552,506,575,621,644,667,690,759,713,782,828,460,483,529} | 593.2000000000000000
552 | {552,506,575,621,644,667,690,759,713,782,828,460,483,529} | 629.2142857142857143
... | ... | ...
529 | {552,506,575,621,644,667,690,759,713,782,828,460,483,529} | 629.2142857142857143
Here's the reference for potential future stuff that I've been looking at: nested window functions (but isn't implemented at the moment, as of Postgresql-11)
EDIT: Last but not least, the condition is a placeholder! it may or may not be tied to userids, it is just used here for the sake of the example, it could have been
CUME_DIST() OVER (
PARTITION BY x -- OR CURRENT_USERID
)
This answers the original version of the question.
You seem to want:
select (case when userid < 100
then array_cat(array[userid],
array_agg(userid) filter (where userid > 100) over ()
else array_agg(userid) filter (where userid > 100) over ()
end)

SQL GROUP BY and differences on same field (for MS Access)

Hi I have the following style of table under MS Access: (I didn't make the table and cant change it)
Date_r | Id_Person |Points |Position
25/05/2015 | 120 | 2000 | 1
25/05/2015 | 230 | 1500 | 2
25/05/2015 | 100 | 500 | 3
21/12/2015 | 120 | 2200 | 1
21/12/2015 | 230 | 2000 | 4
21/12/2015 | 100 | 200 | 20
what I am trying to do is to get a list of players (identified by Id_Person) ordered by the points difference between 2 dates.
So for example if I pick date1=25/05/2015 and date2=21/12/2015 I would get:
Id_Person |Points_Diff
230 | 500
120 | 200
100 |-300
I think I need to make something like
SELECT Id_Person , MAX(Points)-MIN(Points)
FROM Table
WHERE date_r = #25/05/2015# or date_r = #21/12/2015#
GROUP BY Id_Person
ORDER BY MAX(Points)-MIN(Points) DESC
But my problem is that i don't really want to order by (MAX(Points)-MIN(Points)) but rather by (points at date2 - points at date1) which can be different because points can decrease with the time.
One method is to use first and last However, this can sometimes produce strange results, so I think that conditional aggregation is best:
SELECT Id_Person,
(MAX(IIF(date_r = #25/05/2015#, Points, 0)) -
MIN(IIF(date_r = #21/05/2015#, Points, 0))
) as PointsDiff
FROM Table
WHERE date_r IN (#25/05/2015#, #21/12/2015#)
GROUP BY Id_Person
ORDER BY (MAX(IIF(date_r = #25/05/2015#, Points, 0)) -
MIN(IIF(date_r = #21/05/2015#, Points, 0))
) DESC;
Because you have two dates, this is more easily written as:
SELECT Id_Person,
SUM(IIF(date_r = #25/05/2015#, Points, -Points)) as PointsDiff
FROM Table
WHERE date_r IN (#25/05/2015#, #21/12/2015#)
GROUP BY Id_Person
ORDER BY SUM(IIF(date_r = #25/05/2015#, Points, -Points)) DESC;

Need to automate sorting while accounting for overlapping values in SQL

I'm facing a challenging request that's had me beating my head against the keyboard. I need to implement a script which will sort and summarize a dataset while accounting for overlapping values which are associated with different identifiers. The table from which I am selecting contains the following columns:
BoxNumber (Need to group by this value, which serves as the identifier)
ProdBeg (Contains the first 'page number' for the document/record)
ProdEnd (Contains the last 'page number' for the document/record)
DateProduced (Date the document was produced)
ArtifactID (Unique identifier for each document)
NumPages (Contains the number of pages associated with each document)
Selecting a sample of the data with no conditions resembles the following (sorry for lousy formatting):
BoxNumber | ProdBeg | ProdEnd | DateProduced | ArtifactID | NumPages
1200 | ABC01 | ABC10 | 12/4/2013 | 1564589 | 10
1201 | ABC11 | ABC20 | 12/4/2013 | 1498658 | 10
1200 | ABC21 | ABC30 | 12/4/2013 | 1648596 | 10
1200 | ABC31 | ABC40 | 12/4/2013 | 1489535 | 10
Using something like the following effectively groups and sorts the data by box number while accounting for different DateProduced dates, but does not account for overlapping ProdBeg/ProdEnd values between different BoxNumbers:
SELECT BoxNumber, MIN(ProdBeg) AS 'ProdBeg', MAX(ProdEnd) AS 'ProdEnd', DateProduced, COUNT(ArtifactID) AS 'Documents', SUM(NumPages) AS 'Pages'
FROM MyTable
GROUP BY BoxNumber, DateProduced
ORDER BY ProdBeg, ProdEnd
This yields:
BoxNumber | ProdBeg | ProdEnd| DateProduced | Documents| Pages
1200 | ABC01 | ABC40 | 12/4/2013 | 3 | 30
1201 | ABC11 | ABC20 | 12/4/2013 | 1 | 10
Here, it becomes apparent that the ProdBeg/ProdEnd values for box 1201 overlap those for box 1200. No variation on the script above will work, as it will inherently ignore any overlaps and only select the min/max. We need something which will produce the following result:
BoxNumber | ProdBeg | ProdEnd | DateProduced | Documents| Pages
1200 | ABC01 | ABC10 | 12/4/2013 | 1 | 10
1201 | ABC11 | ABC20 | 12/4/2013 | 1 | 10
1200 | ABC21 | ABC40 | 12/4/2013 | 2 | 20
I'm just not sure how we can group by box number without showing only distinct values (which can result in overlaps for ProdBeg/ProdEnd). Any suggestions would be greatly appreciated! The environment version is SQL 2008 R2 (SP1).
Yuch. This would be helped at least if you had lead()/lag() as in SQL Server 2012. But it is doable.
The idea is the following:
Add a variable that is the number part of the code (the last two digits).
Calculate the next number in the sequence.
Calculate a flag if there is a gap to the next number. This is the start of a "group".
Calculate the cumulative sum of the "start of a group" flag. This is a group id.
Do the aggregation.
The following query follows this logic. I didn't include the date produced. This seems redundant with the number, unless a box can appear on multiple days. (Adding the date produced is just a matter of adding the condition to the where clauses.) The resulting query is:
with bp as (
select t.*,
cast(right(prodbeg, 2) as int) as pbeg,
cast(right(prodend, 2) as int) as pend
from mytable t
),
bp1 as (
select bp.*,
(select top 1 pbeg
from bp bp2
where bp2.pbeg < bp.pbeg and pb2.BoxNumber = pb.BoxNumber
order by bp2.pbeg desc
) as prevpend
from bp
),
bp2 as (
select bp1.*,
(select sum(case when prevpend = pbeg - 1 then 0 else 1 end)
from bp1 bp1a
where bp1a.pbeg < bp1.pbeg and pb1a.BoxNumber = pb1.BoxNumber
) as groupid
from bp1
)
select BoxNumber, MIN(ProdBeg) AS ProdBeg, MAX(ProdEnd) AS ProdEnd, DateProduced,
COUNT(ArtifactID) AS 'Documents', SUM(NumPages) AS 'Pages'
FROM bp2
GROUP BY BoxNumber, groupid
ORDER BY ProdBeg, ProdEnd;