Add a subtotal column for aggregated columns - sql

Here's my dataset of trades, traders and counterparties:
TRADER_ID | TRADER_NAME | EXEC_BROKER | TRADE_AMOUNT | TRADE_ID
ABC123 | Jules Winnfield | GOLD | 10000 | ASDADAD
XDA241 | Jimmie Dimmick | GOLD | 12000 | ASSVASD
ADC123 | Vincent Vega | BARC | 10000 | ZXCZCX
ABC123 | Jules Winnfield | BARC | 15000 | ASSXCQA
ADC123 | Vincent Vega | CRED | 250000 | RFAQQA
ABC123 | Jules Winnfield | CRED | 5000 | ASDQ23A
ABC123 | Jules Winnfield | GOLD | 5000 | AVBDQ3A
I'm looking to produce a repeatable monthly report that gives me a view of trading activity aggregated at the counterparty (the EXEC_BROKER field) level, with subtotals - as shown below:
TRADER_ID | TRADER_NAME | NO._OF_CCP_USED | CCP | TRADED_AMT_WITH_CCP | VALUE_OF_TOTAL_TRADES | TRADES_WITH_CCP | TOTAL_TRADES
ABC123 | Jules Winnfield | 3 | GOLD | 15000 | 35000 | 2 | 4
ABC123 | Jules Winnfield | 3 | BARC | 15000 | 35000 | 1 | 4
ABC123 | Jules Winnfield | 3 | CRED | 5000 | 35000 | 1 | 4
...and so on the rest.
The idea is to aggregate the number of trades per counterparty (which I have done using a count function), and the sum of traded amounts with the ccp, but I'm struggling to get the 'subtotal' field next to each trader as shown in my desired output above - so you can see here that Jules has dealt with 3 counterparties in total, with 4 trades between them, and a collective amount of 35000.
I have tried using a combination of aggregate and over by functions, but to no avail.
SELECT
OT.TRADER_ID,
OT.TRADER_NAME,
OT.EXEC_BROKER,
SUM(OT.TRADE_AMOUNT) AS VALUE_OF_TOTAL_TRADES,
COUNT(OT.TRADE_ID) AS TOTAL_TRADES,
COUNT(OT.EXEC_BROKER) OVER PARTITION BY (OT.TRADER_ID) AS NO._OF_CCP_USED,
SUM(OT.TRADE_AMOUNT) OVER PARTITION BY (OT.EXEC_BROKER) AS TRADED_AMT_WITH_CCP,
COUNT(OT.TRADE_ID) OVER PARTITION BY (OT.EXEC_BROKER) AS TRADES_WITH_CCP
FROM dbo.ORDERS_TRADES OT
GROUP BY OT.TRADER_ID, OT.TRADER_NAME, OT.EXEC_BROKER, OT.TRADE_AMOUNT, OT.TRADE_ID
The code above runs but returns millions of rows. When I remove the partition by lines, I get the desired result minus the subtotal columns I'm looking for.
Any suggestions please? Thanks very much!
EDIT:
Final code which gave me the desired output: updating my question to provide this response (thanks to Gordon Linoff) so that others can benefit:
SELECT
OT.TRADER_ID,
OT.TRADER_NAME,
OT.EXEC_BROKER,
RANK() OVER (PARTITION BY OT.TRADER_ID ORDER BY
SUM(OT.TRADE_AMOUNT) DESC) AS CCP_RANK,
SUM(OT.TRADE_AMOUNT) AS TRADED_AMT_WITH_CCP,
SUM(SUM(OT.TRADE_AMOUNT)) OVER (PARTITION BY OT.TRADER_ID) AS
VALUE_OF_TOTAL_TRADES,
COUNT(*) OVER (PARTITION BY OT.TRADER_ID) AS NUM_OF_CCP_USED,
SUM(COUNT(OT.TRADE_ID)) OVER (PARTITION BY OT.TRADER_ID) AS
TOTAL_TRADES
FROM dbo.ORDERS_TRADES OT
GROUP BY OT.TRADER_ID, OT.TRADER_NAME, OT.EXEC_BROKER

You seem to want:
SELECT OT.TRADER_ID, OT.TRADER_NAME, OT.CCP,
COUNT(*) OVER (PARTITION BY OT.TRADER_ID) as NUM_CCP,
SUM(OT.TRADED_AMT) AS TRADED_AMT_WITH_CCP,
SUM(SUM(OT.TRADED_AMT)) OVER (PARTITION BY OT.TRADER_ID) AS VALUE_OF_TOTAL_TRADES,
COUNT(OT.TRADE_ID) AS CCP_TRADES,
SUM(COUNT(OT.TRADE_ID)) OVER (PARTITION BY OT.TRADER_ID) AS TOTAL_TRADES
FROM ORDERS_TRADES OT
GROUP BY OT.TRADER_ID, OT.TRADER_NAME, OT.CCP;
I'm not sure what your query has to do with the results you want. The columns have little to do with what you are asking.
Here is a db<>fiddle.

Making some assumptions about the nomenclature, here is a solution that doesn't use anything too fancy so it's easy to maintain, though it's not the most efficient:
create table trades
(
TRADER_ID varchar(10),
TRADER_NAME varchar(20),
CCP char(4),
TRADED_AMT decimal(10,2),
TRADE_ID varchar(10) primary key
);
insert trades
values
('ABC123', 'Jules Winnfield', 'GOLD', 10000 , 'ASDADAD'),
('XDA241', 'Jimmie Dimmick ', 'GOLD', 12000 , 'ASSVASD'),
('ADC123', 'Vincent Vega ', 'BARC', 10000 , 'ZXCZCX'),
('ABC123', 'Jules Winnfield', 'BARC', 15000 , 'ASSXCQA'),
('ADC123', 'Vincent Vega ', 'CRED', 250000, 'RFAQQA'),
('ABC123', 'Jules Winnfield', 'CRED', 5000 , 'ASDQ23A'),
('ABC123', 'Jules Winnfield', 'GOLD', 5000 , 'AVBDQ3A');
with trader_totals as
(
select trader_id,
distinct_ccps = count(distinct CCP),
total_amt = sum(traded_amt),
total_count = count(*)
from trades
group by trader_id
)
select trader_id = tr.trader_id,
trader_name = trader_name,
distinct_CCP_count = tt.distinct_ccps,
CCP = tr.CCP,
this_CCP_traded_amt = sum(traded_amt),
total_traded_amt = tt.total_amt,
this_CCP_traded_count = count(*),
total_traded_count = tt.total_count
from trades tr
join trader_totals tt on tt.trader_id = tr.trader_id
group by tr.trader_id,
tr.trader_name,
tr.CCP,
tt.distinct_ccps,
tt.total_amt,
tt.total_count

Related

Running total of values from a table until it matches value from another table

I have 2 tables.
Table 1 is a temp variable table:
declare #Temp as table ( proj_num varchar(10), sum_dom decimal(23,8))
My temp table is populated with a list of project numbers, and a month end accounting dollar amount.
For example:
proj_num | sum_dom
11522 | 2477.15
11524 | 26474.20
41865 | 9012.10
Table 2 is a Project Transactions table.
We're concerned with just the following columns:
proj_num
amount
cost_code
tran_date
Individual values will somemething like this:
proj_num | cost_code | amount | tran_date
11522 | LBR | 112.10 | 10/1/2018
11522 | LBR | 1765.90 | 10/2/2018
11522 | MAT | 599.15 | 10/3/2018
11522 | FRT | 57.50 | 10/4/2018
So for this project, since the grand total of $2477.15 is met on 10/3, example output would be:
proj_num | cost_code | amount
11522 | LBR | 1878.00
11522 | MAT | 599.15
I want to sum the amounts (grouped by cost_code, and ordered by tran_date) under the project transaction table until the total sum of values for that project value matches the value in the sum_dom column of the temp table, at which point I will output that data.
Can you help me figure out how to write the query to do that?
I know I should avoid cursors, but I havent had much luck with my attempts so far. I cant seem to get it to keep a running total.
Running sum is done using SUM(...) OVER (ORDER BY ...). You just need to tell where to stop:
SELECT sq.*
FROM projects
INNER JOIN (
SELECT
proj_num,
cost_code,
amount,
SUM(amount) OVER (PARTITION BY proj_num ORDER BY tran_date) AS running_sum
FROM project_transactions
) AS sq ON projects.proj_num = sq.proj_num
WHERE running_sum <= projects.sum_dom
DB Fiddle

Select Top User over a list of Pages

I have a table containing records of Users' internet history. The table's structure contains the User_ID, the Page Accessed, and the Date Accessed of the page. For Example:
+==========================================+
|User_ID | Page_Accessed | Date_Accessed |
+==========================================+
|Johh.Doe | Google | 1/1/2015 |
|Johh.Doe | Google | 1/1/2015 |
|Suzy.Lue | Google | 7/11/2015 |
|Suzy.Lue | Wikipedia | 4/23/2015 |
|Babe Ruth| StackOverflow | 9/1/2015 |
+==========================================+
I am currently trying to use a SQL query that uses:
RANK() OVER (PARTITION BY [Page Accessed] ORDER BY Count(DateAcc))
Then I use a PIVOT() by the Various Sites. However after selecting the records WHERE (Num = 1) from the PIVOT() and a GROUP BY [Rank], I'm ending up with resulting query similar to:
+=================================================+
|Rank | Google | Wikipedia | StackOverflow |
+=================================================+
| 1 | John Doe| NULL | NULL |
| 1 | NULL | Suzy Lue | NULL |
| 1 | NULL | NULL | Babe Ruth |
+=================================================+
Instead I need to reformat my output as:
+=================================================+
|Rank | Google | Wikipedia | StackOverflow |
+=================================================+
| 1 | John Doe| Suzy Lue | Babe Ruth |
+=================================================+
My Current Query:
SELECT Rank, Google, Wikipedia, StackOverflow
FROM(
SELECT TOP (100) PERCENT User_ID, Page_Accessed, COUNT(Date_Accessed) AS Views,
RANK() OVER (PARTITION BY Page_Accessed ORDER BY Count(Date_Accessed) DESC) AS Rank
FROM Record_Table
GROUP BY dbo.location_key.subSite, dbo.user_info_list_parse.Name
ORDER BY Views DESC) AS tb
PIVOT (
max(tb.User_ID) FOR
Page_Accessed IN ( Google, Wikipedia, StackOverflow)
) pvt
WHERE (Num = 1)
Are there any creative solutions to obtain this result?
I think you've already found solution but for your information and for others reading this - let me erase noise in this query. There is no need to ORDER BY, no need to apply TOP (100) PERCENT, Views column is redundant. I would simplify this query as follows:
CREATE TABLE InternetHistory
(
[User_ID] varchar(20),
[Page_Accessed] varchar(20),
[Date_Accessed] datetime
)
INSERT InternetHistory VALUES
('Johh.Doe', 'Google', '2015-01-01'),
('Johh.Doe', 'Google', '2015-01-01'),
('Suzy.Lue', 'Google', '2015-07-11'),
('Suzy.Lue', 'Wikipedia', '2015-04-23'),
('Babe Ruth', 'StackOverflow', '2015-01-09')
SELECT * FROM
(
SELECT [User_ID], [Page_Accessed], RANK() OVER (PARTITION BY [Page_Accessed] ORDER BY COUNT(*) DESC) Ranking
FROM InternetHistory
GROUP BY [User_ID], [Page_Accessed]
) AS Src
PIVOT
(
MAX([User_Id]) FOR [Page_Accessed] IN ([Google], [Wikipedia], [StackOverflow])
) AS Pvt
WHERE Ranking = 1

how to get daily profit from sql table

I'm stucking for a solution at the problem of finding daily profits from db (ms access) table. The difference wrt other tips I found online is that I don't have in the table a field "Price" and one "Cost", but a field "Type" which distinguish if it is a revenue "S" or a cost "C"
this is the table "Record"
| Date | Price | Quantity | Type |
-----------------------------------
|01/02 | 20 | 2 | C |
|01/02 | 10 | 1 | S |
|01/02 | 3 | 10 | S |
|01/02 | 5 | 2 | C |
|03/04 | 12 | 3 | C |
|03/03 | 200 | 1 | S |
|03/03 | 120 | 2 | C |
So far I tried different solutions like:
SELECT
(SELECT SUM (RS.Price* RS.Quantity)
FROM Record RS WHERE RS.Type='S' GROUP BY RS.Data
) as totalSales,
(SELECT SUM (RC.Price*RC.Quantity)
FROM Record RC WHERE RC.Type='C' GROUP BY RC.Date
) as totalLosses,
ROUND(totalSales-totaleLosses,2) as NetTotal,
R.Date
FROM RECORD R";
in my mind it could work but obviously it doesn't
and
SELECT RC.Data, ROUND(SUM (RC.Price*RC.QuantitY),2) as DailyLoss
INTO #DailyLosses
FROM Record RC
WHERE RC.Type='C' GROUP BY RC.Date
SELECT RS.Date, ROUND(SUM (RS.Price*RS.Quantity),2) as DailyRevenue
INTO #DailyRevenues
FROM Record RS
WHERE RS.Type='S'GROUP BY RS.Date
SELECT Date, DailyRevenue - DailyLoss as DailyProfit
FROM #DailyLosses dlos, #DailyRevenues drev
WHERE dlos.Date = drev.Date";
My problem beyond the correct syntax is the approach to this kind of problem
You can use grouping and conditional summing. Try this:
SELECT data.Date, data.Income - data.Cost as Profit
FROM (
SELECT Record.Date as Date,
SUM(IIF(Record.Type = 'S', Record.Price * Record.Quantity, 0)) as Income,
SUM(IIF(Record.Type = 'C', Record.Price * Record.Quantity, 0)) as Cost,
FROM Record
GROUP BY Record.Date
) data
In this case you first create a sub-query to get separate fields for Income and Cost, and then your outer query uses subtraction to get actual profit.

Need to automate sorting while accounting for overlapping values in SQL

I'm facing a challenging request that's had me beating my head against the keyboard. I need to implement a script which will sort and summarize a dataset while accounting for overlapping values which are associated with different identifiers. The table from which I am selecting contains the following columns:
BoxNumber (Need to group by this value, which serves as the identifier)
ProdBeg (Contains the first 'page number' for the document/record)
ProdEnd (Contains the last 'page number' for the document/record)
DateProduced (Date the document was produced)
ArtifactID (Unique identifier for each document)
NumPages (Contains the number of pages associated with each document)
Selecting a sample of the data with no conditions resembles the following (sorry for lousy formatting):
BoxNumber | ProdBeg | ProdEnd | DateProduced | ArtifactID | NumPages
1200 | ABC01 | ABC10 | 12/4/2013 | 1564589 | 10
1201 | ABC11 | ABC20 | 12/4/2013 | 1498658 | 10
1200 | ABC21 | ABC30 | 12/4/2013 | 1648596 | 10
1200 | ABC31 | ABC40 | 12/4/2013 | 1489535 | 10
Using something like the following effectively groups and sorts the data by box number while accounting for different DateProduced dates, but does not account for overlapping ProdBeg/ProdEnd values between different BoxNumbers:
SELECT BoxNumber, MIN(ProdBeg) AS 'ProdBeg', MAX(ProdEnd) AS 'ProdEnd', DateProduced, COUNT(ArtifactID) AS 'Documents', SUM(NumPages) AS 'Pages'
FROM MyTable
GROUP BY BoxNumber, DateProduced
ORDER BY ProdBeg, ProdEnd
This yields:
BoxNumber | ProdBeg | ProdEnd| DateProduced | Documents| Pages
1200 | ABC01 | ABC40 | 12/4/2013 | 3 | 30
1201 | ABC11 | ABC20 | 12/4/2013 | 1 | 10
Here, it becomes apparent that the ProdBeg/ProdEnd values for box 1201 overlap those for box 1200. No variation on the script above will work, as it will inherently ignore any overlaps and only select the min/max. We need something which will produce the following result:
BoxNumber | ProdBeg | ProdEnd | DateProduced | Documents| Pages
1200 | ABC01 | ABC10 | 12/4/2013 | 1 | 10
1201 | ABC11 | ABC20 | 12/4/2013 | 1 | 10
1200 | ABC21 | ABC40 | 12/4/2013 | 2 | 20
I'm just not sure how we can group by box number without showing only distinct values (which can result in overlaps for ProdBeg/ProdEnd). Any suggestions would be greatly appreciated! The environment version is SQL 2008 R2 (SP1).
Yuch. This would be helped at least if you had lead()/lag() as in SQL Server 2012. But it is doable.
The idea is the following:
Add a variable that is the number part of the code (the last two digits).
Calculate the next number in the sequence.
Calculate a flag if there is a gap to the next number. This is the start of a "group".
Calculate the cumulative sum of the "start of a group" flag. This is a group id.
Do the aggregation.
The following query follows this logic. I didn't include the date produced. This seems redundant with the number, unless a box can appear on multiple days. (Adding the date produced is just a matter of adding the condition to the where clauses.) The resulting query is:
with bp as (
select t.*,
cast(right(prodbeg, 2) as int) as pbeg,
cast(right(prodend, 2) as int) as pend
from mytable t
),
bp1 as (
select bp.*,
(select top 1 pbeg
from bp bp2
where bp2.pbeg < bp.pbeg and pb2.BoxNumber = pb.BoxNumber
order by bp2.pbeg desc
) as prevpend
from bp
),
bp2 as (
select bp1.*,
(select sum(case when prevpend = pbeg - 1 then 0 else 1 end)
from bp1 bp1a
where bp1a.pbeg < bp1.pbeg and pb1a.BoxNumber = pb1.BoxNumber
) as groupid
from bp1
)
select BoxNumber, MIN(ProdBeg) AS ProdBeg, MAX(ProdEnd) AS ProdEnd, DateProduced,
COUNT(ArtifactID) AS 'Documents', SUM(NumPages) AS 'Pages'
FROM bp2
GROUP BY BoxNumber, groupid
ORDER BY ProdBeg, ProdEnd;

Finding a date from a SUM

I am having trouble finding what date my customers hit a certain threshold in how much money they make.
customer_id | Amount | created_at
---------------------------
1134 | 10 | 01.01.2010
1134 | 15 | 02.01.2010
1134 | 5 | 03.24.2010
1235 | 10 | 01.03.2010
1235 | 15 | 01.03.2010
1235 | 30 | 01.03.2010
1756 | 50 | 01.05.2010
1756 | 100 | 01.25.2010
To determine how much total amount they made I run a simple query like this:
SELECT customer_id, SUM(amount)
FROM table GROUP BY customer_id
But I need to be able to find for e.g. the date a customer hits $100 in total amount.
Any help is greatly appreciated. Thanks!
Jesse,
I believe you are looking for a version of "running total" calculation.
Take a look at this post calculate-a-running-total.
There is number of useful links there.
This article have a lot of code that you could reuse as well: http://www.sqlteam.com/article/calculating-running-totals.
Something like having clause
SELECT customer_id, SUM(amount) as Total FROM table GROUP BY customer_id having Total > 100
I'm not sure if MySQL supports subqueries, so take this with a grain of salt:
SELECT customer_id
, MIN(created_at) AS FirstDate
FROM ( SELECT customer_id
, created_at
, ( SELECT SUM(amount)
FROM [Table] t
WHERE t.CustomerID = [Table].CustomerID
AND t.created_at <= [Table].created_at
) AS RunTot
FROM [Table]
) x
WHERE x.RunTot >= 100
GROUP BY customer_id