updating transactions with preaggregated data - sql

I have a transactions table. I need to update the table with a pre-aggregated value from another table and then roll these down to varying levels of granularity.
However, the final output is incorrect. Hoping someone here can help me figure out how to go about this.
Table A:
TransID BankName Location Region SaleType MonthlyPayment Weight
1 BOA Boston East F 3000 3
2 Mellon Pittsburgh East C 1000 3
3 BOA Boston East C 2000 2
4 BOA Boston East 1000 2
Table B
BanKname Location Region Sales
BOA Boston East 500
Mellon Pittsburgh East 1000
Desired Output structure
BankName Location Region SaleType AvgSales AvgMonthlyPayment
Issue is that when updating and doing the weighted average, each of the Boston transactions is getting the 500 sales. When added, total sales should be 1500 but is now 2500.
If I update Table A with Sales value from Table B, sales gets repeated for each saletype - so it throws off the final average sales.
update is this: (Added new column sales in A)
update a
set a.sales = b.sales
from tableA a join tableB b on a.bankname=b.bankname and
a.location=b.location and a.region = b.region
weighted average from A is calculated like this:
select bankname,location,region,saletype,
sum(case when sales is not null then sales*weight else 0
end)/sum(weight) as avgsales
, sum(case when monthlypayment is not null then monthlypayment*weight
else 0 end)/sum(weight) as avgmonthlyp
from tableA
group by bankname,location,region,saletype
For each saletype, the sales is updated with value thus increasing the final value by that fold.
How can I update sales so that BOA only gets 500 and Mellon only gets 1000 and the total # sales is at 1500?

Related

Calculate the number of products responsible for 50% of my sales

I have a shop that sells products in different countries.
I end up with a sales table like this ( with much more month)
Month
Country
Product
Sales
01-2022
UK
Tomato
10
01-2022
UK
Banana
4
01-2022
UK
Garlic
1
01-2022
FR
Tomato
1
01-2022
FR
Banana
2
01-2022
FR
Garlic
1
I would like to know the number of products responsible for 50% of the sales per month and country. Something like this.
Month
Country
Nb products accountable for 50% sales
01-2022
UK
1
02-2022
UK
3
03-2022
UK
2
01-2022
FR
1
02-2022
FR
4
03-2022
FR
3
The objective is to have the percentage of my catalogue responsible for the majority of sales. Exemple: 10% of my catalogue represents 50% of sales.
I have tried to solve the problem with multiple window functions and I have already searched the open topics without success
I finally found solution tweaking windows functions.
,t1 AS (
SELECT
*
,SUM(sales) OVER (PARTITION BY country_group, order_date ORDER BY sales DESC ROWS BETWEEN UNBOUNDED PRECEDING AND 0 PRECEDING) AS running_total
,0.5*SUM(sales) OVER(PARTITION BY country_group, order_date) AS total_sales_x_50perc
FROM t0
ORDER BY 1
)
SELECT
order_date
,country_group
,COUNT(DISTINCT CASE WHEN running_total <= total_sales_x_50perc THEN product ELSE NULL END) AS nb_products
,COUNT(DISTINCT product) AS total_nb_products
,COUNT(DISTINCT CASE WHEN running_total <= total_sales_x_50perc THEN product ELSE NULL END)/COUNT(DISTINCT products) AS perc
FROM t1
GROUP BY 1,2
ORDER BY 1

What is the most appropriate logic to add rows in table

I have following table structure (simplified to make essential things clear), which lists Top 3 bank customers in each category of loan and branch of the bank. SNO column is rank of the customer, value of which is up to 3.
Loan Category
SNO
Branch
Customer Name
Amount
Home Loan
1
abc
Piyush
10000
Home Loan
2
abc
Shyam
5000
Home Loan
3
abc
Kamal
2000
Home Loan
1
xyz
Xman
50000
Home Loan
2
xyz
Shyam
20000
Auto Loan
1
abc
Birendra
10000
Personal Loan
1
xyz
Gyan
5000
Personal Loan
2
xyz
Prakash
2000
I am trying to make another table such that, If there are less than 3 customers in each loan category and branch, Insert a dummy row for each branch and category with values of customer name and amount as NULL.
Essentially, I am trying to get following table.
Loan Category
SNO
Branch
Customer Name
Amount
Home Loan
1
abc
Piyush
10000
Home Loan
2
abc
Shyam
5000
Home Loan
3
abc
Kamal
2000
Home Loan
1
xyz
Xman
50000
Home Loan
2
xyz
Shyam
20000
Home Loan
3
xyz
added row
Auto Loan
1
abc
Birendra
10000
Auto Loan
2
abc
added row
Auto Loan
3
abc
added row
Auto Loan
1
xyz
added row
Auto Loan
2
xyz
added row
Auto Loan
3
xyz
added row
Personal Loan
1
xyz
Gyan
5000
Personal Loan
2
xyz
Prakash
2000
Personal Loan
3
xyz
added row
Personal Loan
1
abc
added row
Personal Loan
2
abc
added row
Personal Loan
3
abc
added row
I have solved this problem by using Loop iterating over all category and branch and inserting dummy row, if max(sno) < 3 for each category/branch. But, I am looking for appropriate logic without iterating over all category and branch. In my actual table, there are thousands of branch values and more than 100 categories. So, iterating over all combination of category and branch is very expensive in terms of performance.
I need to write some good logic preferably using SQL constructs only or not using any loop.
Ok. So you must have some tables where branch and category is listed in single or multiple tables. Lets take it as your branch and category tables and you must have some query which produced the result mentioned in the question. Lets call it your_query.
You need to generate 3 records per branch per category.
Select c.category as loan_category,
L.lvl as sno,
B.branch,
Q.cutomername,
Q.amount
From category c
Cross join branch b
Cross Join (select level lvl from dual connect by level <= 3) l
Left Join your_query q on q.branch = c.branch
and q.category = c.category
and l.lvl = q.sno
Ordet by c.category, B.branch, L.lvl

SQL - Splitting / Prorating a metric

I have a table where a household ID can have multiple customer ID's. But revenue is linked to household ID. So when looking at the table with both household ID and customer ID, revenue is double counted. Sample below.
HHID 123 has 3 customers, total revenue is only $900 for the household but if I sum grouped by HHID, it's $2700. Similarly, HHID 456 has 2 customers and HHID 789 has 1.
How do I create a column to split this revenue per customer ID. I can do this in multiple steps but I'm wondering if there is a way to do this in one query.
HH ID CUST ID REVENUE NEW_COL
123 5655 $900 $300
123 6678 $900 $300
123 8893 $900 $300
789 8988 $350 $350
456 2343 $400 $200
456 4555 $400 $200
I need the table to have both HH ID and CUST ID since some metrics are based on HH and some based on CUST (such as age). So I need to add a column like the above to the table so I can use it to sum revenue instead of the actual revenue column.
I'm on Teradata.

SQL Server query for hierarchical group wise sum for the chart of accounts

I have created a query for displaying account name and balance as follows.
Table Structure is as follows.
Level Account Balance AccountCode ParentAccountCode
1 Revenue 0 41 NULL
2 Direct Income 0 411 41
3 Sales 0 4111 411
4 Sales 0 41111 4111
5 In Store Sales 100 411111 41111
5 Online Sales 200 411112 41111
2 Indirect Income 0 412 41
3 Interest 0 4121 412
4 Bank Interest 0 41211 4121
5 Bank Interest A 400 412111 41211
5 Bank Interest B 700 412112 41211
3 Other Income 0 4122 412
4 Other Income 0 41221 4122
5 Other Income 900 412211 41221
All the above fields are from same table.
But only level 5 accounts have balance.
I want to write the sql query to show addition of accounts of child account in parent accounts hierarchically and level wise.
Expected result is as follows
Level Account Balance
1 Revenue 2300
2 Direct Income 300
3 Sales 300
4 Sales 300
5 In Store Sales 100
5 Online Sales 200
2 Indirect Income 2000
3 Interest 1100
4 Bank Interest 1100
5 Bank Interest A 400
5 Bank Interest B 700
3 Other Income 900
4 Other Income 900
5 Other Income 900
Thanks.
Edited to match your structure:
with AccountBalanceHierarchy as
(
-- root level
select AccountCode
, ParentAccountCode
, Account
, 1 as [Level]
, cast('-' + cast (AccountCode as nvarchar(10)) + '-' as nvarchar(100)) as Hierarchy -- build hierarchy in format -L0-L1-L2-...-Ln--
, Balance
from AccountBalance
where ParentAccountCode is null
union all
-- recursive join on parent, building hierarchy
select CurrentLevel.AccountCode
, CurrentLevel.ParentAccountCode
, CurrentLevel.Account
, ParentLevel.[Level]+1 as [Level]
, cast(ParentLevel.Hierarchy + cast (CurrentLevel.AccountCode as nvarchar(10))+ '-' as nvarchar(100)) as Hierarchy
, CurrentLevel.Balance
from AccountBalance CurrentLevel
join AccountBalanceHierarchy ParentLevel on CurrentLevel.ParentAccountCode = ParentLevel.AccountCode
)
select CurrentHierarchyLevel.[Level]
, replicate(' ', CurrentHierarchyLevel.[Level]) + CurrentHierarchyLevel.Account as Account
, sum(case when CurrentHierarchyLevel.Hierarchy = substring(ChildLevel.Hierarchy, 1, len(CurrentHierarchyLevel.Hierarchy)) then ChildLevel.Balance else 0 end)
from AccountBalanceHierarchy as CurrentHierarchyLevel
cross
join AccountBalanceHierarchy as ChildLevel
group by CurrentHierarchyLevel.[Level], CurrentHierarchyLevel.Account,CurrentHierarchyLevel.Hierarchy
order by CurrentHierarchyLevel.Hierarchy
Explanation:
In the hierarchy CTE, we build a hierarchy of your nodes starting with the root level (where ParentAccountCode is null) and then we join other levels through the ParentAccountCOde = AccountCode while increasing the Level.
In the CTE we also build a flat structure of your nodes, in a form of -L0-L1-...-Ln for each of the nodes. The root will have this structure as -41-, then its children with IDs 411 and 412 will have the structure -41-411- and -41-4112-, 411's child with code 4111 will be -41-411-4111-. Your AccountCodes actually make this unnecessary, because they match eachother, but I have a feeling they might not necessarily be the true IDs since you just edited your question a couple of times. The solution with building the structure like in the code makes this independed of IDs given to the accounts.
Then finally we select from the CTE and get the balance for all cross joining all the nodes and matching the generated flat structure of the children with the current structure. We know that the children of a given node will have the same beginning of the structure as the parent. Example following above: all children of root item 1 with structure -1- will have -1- at the beginning, e.g. -1-3-, -1-3-5-, -1-2-; or all children of -1-3-5- will have -1-3-5- at the beginning (e.g. -1-3-5-10). This is done in the SUM by using CASE statement - if the beginning of the structure matches (substring of the child structure is equal to the parent's structure), use the current_value for sum.
Here's the sqlfiddle with schema and query http://sqlfiddle.com/#!6/09ec8/1 and resulting data which matches your expected result:
Level Account Balance
----------- ------------------------------------------ ---------------------
1 Revenue 2300,00
2 Direct Income 300,00
3 Sales 300,00
4 Sales 300,00
5 In Store Sales 100,00
5 Online Sales 200,00
2 Indirect Income 2000,00
3 Interest 1100,00
4 Bank Interest 1100,00
5 Bank Interest A 400,00
5 Bank Interest B 700,00
3 Other Income 900,00
4 Other Income 900,00
5 Other Income 900,00

Joining to another table only on the first occurrence of a field

Note: I have tried to simplify the below to make it simpler both for me and for anyone else to understand, the tables I reference below are in fact sub-queries joining a lot of different data together from different sources)
I have a table of purchased items:
Items
ItemSaleID CustomerID ItemCode
1 100 A
2 100 B
3 100 C
4 200 A
5 200 C
I also have transaction header and detail tables coming from a till system:
TranDetail
TranDetailID TranHeaderID ItemSaleID Cost
11 51 1 $10
12 51 2 $10
13 51 3 $10
14 52 4 $20
15 52 5 $10
TranHeader
TranHeaderID CustomerID Payment Time
51 100 $100 11:00
52 200 $50 12:00
53 100 $20 13:00
I want to get to a point where I have a table like:
ItemSaleID CustomerID ItemCode Cost Payment Time
1 100 A $10 $120 11:00
2 100 B $10 11:00
3 100 C $10 11:00
4 200 D $20 $50 12:00
5 200 E $10 12:00
I have a query which produces the results but when I add in the ROW_NUMBER() case statement goes from 2 minutes to 30+ minutes.
The query is further confused because I need to supply the earliest date relating to the list of transactions and the total price paid (could be many transactions throughout the day for upgrades etc)
Query below:
SELECT ItemSaleID
, CustomerID
, ItemCode
, Cost
, CASE WHEN ROW_NUMBER() OVER (PARTITION BY TranHeaderID ORDER BY ItemSaleID) = 1
THEN TRN.Payment ELSE NULL END AS Payment
FROM Items I
OUTER APPLY (
SELECT TOP 1 SUB.Payment, Time
FROM TranHeader H
INNER JOIN TranDetail D ON H.TranHeaderID = D.TranHeaderID
OUTER APPLY (SELECT SUM(Payment) AS Payment
FROM TranHeader H2
WHERE H2.CustomerID = Items.CustomerID
) SUB
WHERE D.CustomerID = I.CustomerID
) TRN
WHERE ...
Is there a way that I can only show payments for each occurrence of the customer ID whilst maintaining performance