Calculating and displaying customer lifetime value histogram with BigQuery and Data Studio - google-bigquery

Consider a table in Google BigQuery containing purchase records for customer. For the sake of simplicity, let's focus on the following properties:
customer_id, product_id, amount
I'd like to create a Google Data Studio report from the above data set showing a customer lifetime value histogram. The customer lifetime value is the sum of amount for any given customer. The histogram would show how many customers fall into a certain bucket by their total amount - I would define the buckets like 0-10, 10-20, 20-30 etc. value ranges.
Like this:
Finally, I'd also like to filter the histogram by product_id. When the filter is active, the histogram would show the totals for customers who - at least once - purchased the given product.
As of this moment, I think this is not possible to implement in Datastudio, but I hope I am wrong.
Things I've tried so far:
Displaying an average customer lifetime value for the whole dataset is easy, via a calculated field in Datastudio as SUM(amount) / COUNT(customer_id)
For creating a histogram, I don't see any way purely in Data Studio (based on the above data set). I think I need to create a view of the original table, consisting a single row for each customer with the total amount. The bucket assignment could be implemented either in Big Query or in Data Studio with CASE ... WHEN.
However, for the final step, i.e. creating a product filter that filters the histogram for those customers who purchased the given product, I have no clue how to approach this.
Any thoughts?

I was able to do a similar reproduction to what you describe but it's not straightforward so I'll try to detail everything. The main idea is to have two data sources from the same table: one contains customer_id and product_id so that we can filter it while the other one contains customer_id and the already calculated amount_bucket field. This way we can join it (blend data) on customer_id and filter according to product_id which won't change the amount_bucket calculations.
I used the following script to create some data in BigQuery:
CREATE OR REPLACE TABLE data_studio.histogram
(
customer_id STRING,
product_id STRING,
amount INT64
);
INSERT INTO data_studio.histogram (customer_id, product_id, amount)
VALUES ('John', 'Game', 60),
('John', 'TV', 800),
('John', 'Console', 300),
('Paul', 'Sofa', 1200),
('George', 'TV', 750),
('Ringo', 'Movie', 20),
('Ringo', 'Console', 250)
;
Then I connect directly to the BigQuery table and get the following fields. Data source is called histogram:
We add our second data source (BigQuery) using a custom query:
SELECT
customer_id,
CASE
WHEN SUM(amount) < 500 THEN '0-500'
WHEN SUM(amount) < 1000 THEN '500-1000'
WHEN SUM(amount) < 1500 THEN '1000-1500'
ELSE '1500+'
END
AS amount_bucket
FROM
data_studio.histogram
GROUP BY
customer_id
With only the latter we could already do a basic histogram with the following configuration:
Dimension is amount_bucket, metric is Record count. I made a bucket_order custom field to sort it as lexicographically '1000-1500' comes before '500-1000':
CASE
WHEN amount_bucket = '0-500' THEN 0
WHEN amount_bucket = '500-1000' THEN 1
WHEN amount_bucket = '1000-1500' THEN 2
ELSE 3
END
Now we add the product_id filter on top and a new chart with the following configuration:
Note that metric is CTD (Count Distinct) of customer_id and the Blended data data source is implemented as:
An example where I filter by TV so only George and John appear but the other products are still counted for the total amount calculation:
I hope it works for you.

Related

How to calculate a bank's deposit growth from one call report to the next, as a percentage?

I downloaded the entire FDIC bank call reports dataset, and uploaded it to BigQuery.
The table I currently have looks like this:
What I am trying to accomplish is adding a column showing the deposit growth rate since the last quarter for each bank:
Note:The first reporting date for each bank (e.g. 19921231) will not have a "Quarterly Deposit Growth". Hence the two empty cells for the two banks.
I would like to know if a bank is increasing or decreasing its deposits each quarter/call report (viewed as a percentage).
e.g. "On their last call report (19921231)First National Bank had deposits of 456789 (in 1000's). In their next call report (19930331)First National bank had deposits of 567890 (in 1000's). What is the percentage increase (or decrease) in deposits"?
This "_%_Change_in_Deposits" column would be displayed as a new column.
This is the code I have written so far:
select
SFRNLL.repdte, SFRNLL.cert, SFRNLL.name, SFRNLL.city, SFRNLL.county, SFRNLL.stalp, SFRNLL.specgrp AS `Loan_Specialization`, SFRNLL.lnreres as `_1_to_4_Residential_Loans`, AL.dep as `Deposits`, AL.lnlsnet as `loans_and_leases`,
IEEE_DIVIDE(SFRNLL.lnreres, AL.lnlsnet) as SFR2TotalLoanRatio
FROM usa_fdic_call_reports_1992.All_Reports_19921231_1_4_Family_Residential_Net_Loans_and_Leases as SFRNLL
JOIN usa_fdic_call_reports_1992.All_Reports_19921231_Assets_and_Liabilities as AL
ON SFRNLL.cert = AL.cert
where SFRNLL.specgrp = 4 and IEEE_DIVIDE(SFRNLL.lnreres, AL.lnlsnet) <= 0.10
UNION ALL
select
SFRNLL.repdte, SFRNLL.cert, SFRNLL.name, SFRNLL.city, SFRNLL.county, SFRNLL.stalp, SFRNLL.specgrp AS `Loan_Specialization`, SFRNLL.lnreres as `_1_to_4_Residential_Loans`, AL.dep as `Deposits`, AL.lnlsnet as `loans_and_leases`,
IEEE_DIVIDE(SFRNLL.lnreres, AL.lnlsnet) as SFR2TotalLoanRatio
FROM usa_fdic_call_reports_1993.All_Reports_19930331_1_4_Family_Residential_Net_Loans_and_Leases as SFRNLL
JOIN usa_fdic_call_reports_1993.All_Reports_19930331_Assets_and_Liabilities as AL
ON SFRNLL.cert = AL.cert
where SFRNLL.specgrp = 4 and IEEE_DIVIDE(SFRNLL.lnreres, AL.lnlsnet) <= 0.10
The table looks like this:
Additional notes:
I would also like to view the last column (SFR2TotalLoansRatio) as a percentage.
This code runs correctly, however, previously I was getting a "division by zero" error when attempting to run 50,000 rows (1992 to the present).
Addressing each of your question individually.
First) Retrieving SFR2TotalLoanRatio as percentage, I assume you want to see 9.988% instead of 0.0988 in your results. Currently, in BigQuery you can achieve this by casting the field into a STRING then, concatenating the % sign. Below there is an example with sample data:
WITH data as (
SELECT 0.0123 as percentage UNION ALL
SELECT 0.0999 as percentage UNION ALL
SELECT 0.3456 as percentage
)
SELECT CONCAT(CAST(percentage*100 as String),"%") as formatted_percentage FROM data
And the output,
Row formatted_percentage
1 1.23%
2 9.99%
3 34.56%
Second) Regarding your question about the division by zero error. I am assuming IEEE_DIVIDE(arg1,arg2) is a function to perform the division, in which arg1 is the divisor and arg2 is the dividend. Therefore, I would adivse your to explore your data in order to figured out which records have divisor equals to zero. After gathering these results, you can determine what to do with them. In case you decide to discard them you can simply add within your WHERE statement in each of your JOINs: AL.lnlsnet = 0. On the other hand, you can also modify the records where lnlsnet = 0 using a CASE WHEN or IF statements.
UPDATE:
In order to add this piece of code your query, you u have to wrap your code within a temporary table. Then, I will make two adjustments, first a temporary function in order to calculate the percentage and format it with the % sign. Second, retrieving the previous number of deposits to calculate the desired percentage. I am also assuming that cert is the individual id for each of the bank's clients. The modifications will be as follows:
#the following function MUST be the first thing within your query
CREATE TEMP FUNCTION percent(dep INT64, prev_dep INT64) AS (
Concat(Cast((dep-prev_dep)/prev_dep*100 AS STRING), "%")
);
#followed by the query you have created so far as a temporary table, notice the the comma I added after the last parentheses
WITH data AS(
#your query
),
#within this second part you need to select all the columns from data, and LAG function will be used to retrieve the previous number of deposits for each client
data_2 as (
SELECT repdte, cert, name, city, county, stalp, Loan_Specialization, _1_to_4_Residential_Loans,Deposits, loans_and_leases, SFR2TotalLoanRatio,
CASE WHEN cert = lag(cert) OVER (PARTITION BY id ORDER BY d) THEN lag(Deposits) OVER (PARTITION BY id ORDER BY id) ELSE NULL END AS prev_dep FROM data
)
SELECT repdte, cert, name, city, county, stalp, Loan_Specialization, _1_to_4_Residential_Loans,Deposits, loans_and_leases, SFR2TotalLoanRatio, percent(Deposits,prev_dep) as dept_growth_rate FROM data_2
Note that the built-in function LAG is used together with CASE WHEN in order to retrieve the previous amount of deposits per client.

How to make a query that return data of rows related to each row in table

i have some tables about Double-entry bookkeeping.
table VoucherDetail Contains Accounting Entries for Each Voucher and
other tables are Accounts Group/Ledger/Definitive
here are diagrams of tables
im trying to get opposite side of an entry and show it in a custom column that matches entry debit/credit amount(Ref to image 2).
i did some google search and find nothing. here is the query i made so far(Ref to image 1):
SELECT
dbo.Vouchers.VoucherId,
vd.VoucherDetailIndex AS ind,
vd.Debit,
vd.Credit,
vd.Description,
CONCAT ( ag.Name, '_', al.Name, '_', ad.Name ) AS names,
CONCAT ( ag.GroupId, '_', al.LedgerId, '_', ad.DefinitiveId ) AS ids
FROM dbo.Vouchers
JOIN dbo.VoucherDetails AS vd ON vd.Voucher_VoucherIndex = dbo.Vouchers.VoucherIndex
JOIN dbo.AccDefinitives AS ad ON vd.AccDefinitive_DefinitiveIndex = ad.DefinitiveIndex
JOIN dbo.AccLedgers AS al ON ad.AccLedger_LedgerIndex = al.LedgerIndex
JOIN dbo.AccGroups AS ag ON al.AccGroup_GroupIndex = ag.GroupIndex
here is the result im getting :
result i want to be :
here is an example to explain what i need :
EVENT :
we put 10$ on bank as our Equity, now we need to create a voucher for this:
INSERT INTO Vouchers(VoucherIndex, VoucherId, VoucherDate, Description) VALUES
(1, 1, 2019/01/01, initial investment);
and now we need to add Entry of this event to VoucherDetail of Voucher 1
which will have 2 entry; 1 for cash and 1 for Equity :
INSERT INTO VoucherDetails(VoucherDetailIndex, Debit, Credit, Description AccDefinitive_DefinitiveIndex, AccLedger_LedgerIndex, Voucher_VoucherIndex, EntityOrder) VALUES
(1, 10$, 0, 'Put Cash on Bank as initial Investment', 10101, 101, 1, 1),
(2, 0, 10$, 'initial Investment', 50101, 501, 1, 2);
now we run the first query i provided here is the result
now we have our common result, lets get to the problem
imagine someone filled these tables with 10000 row data
and we need to find Voucher no.10, with 20 entries inside VoucherDetail
we get these entries by doing a simple query.
but we don't know which related to which(like in above example Cash with 10$ debt related to Equity with 10$ credit)
if we want to know it, we need to spend time on it every time we need to find something
the query need to search whole table and find opposite side related to each row based on Debit or Credit value of row
this should be the result i wrote in excel :
as you can see in the image above there is 2 new columns added
Account in opposite Side and Account ID in opposite side
first row refers to Equity which related to Cash and
second row refers to Cash Which related to Equity.
As far as I can see, what you need to be able to do is join two VoucherDetail records that have the same Voucher_VoucherIndex value (let's call this VoucherID for brevity). However, the only two things these records have in common is their VoucherID and the fact that the Debit value = the Credit value in the other, and vice versa.
In the comments you mentioned that multiple VoucherDetail rows with the same VoucherID can have the same Debit value (and I presume Credit value). If this wasn't the case, you could add something like this to your query:
JOIN dbo.VoucherDetails AS vd_opposite
ON vd.Voucher_VoucherIndex = vd_opposite.Voucher_VoucherIndex
AND (vd.Debit = vd_opposite.Credit OR vd.Credit = vd_opposite.Debit)
You can't do this though, because Debit/Credit and VoucherID together are not enough to be unique, so you might pick up extra rows in the join that you don't want.
Therefore, your only option is to add a new ID field to your table (maybe called SaleID or something) that definitively links the two rows that represent opposite sides of the same "sale" with a common ID. Then, the above JOIN would look like this:
JOIN dbo.VoucherDetails AS vd_opposite
ON vd.Voucher_VoucherIndex = vd_opposite.Voucher_VoucherIndex
AND vd.SaleID = vd_opposite.SaleID
In addition to adding that JOIN, you would need to join the new vd_opposite table against all of the dbo.Acc* tables again to get access to the data you want, and obviously add the fields from those tables that you want in the results to your SELECT fields.

In a pivot table: how to ignore dimension in an expression using a variable that depends on that dimension

I'm trying to rank (A,B,C) a list of customers according to their profitability , which is calculated as the amount of each sale multiplied by the product profitability (each product has a profitability value assigned). Hence, Profit = SaleAmount*ProductProfitability
To rank every customer, I have a pivot table with the customer id (CustID) as dimension and two expressions:
1)
= SaleAmount*ProductProfitability
2) = if(SaleAmount*ProductProfitability > $(vPercentile75Profit),'A', if(SaleAmount*ProductProfitability > $(vPercentil25Profit),'B','C'))
Expression 2) works correctly if I fix the values of vPercentile75Profit and vPercentile25Profit, but obviously I need this to be dynamic.
For that I've defined those variables as (same for both, just switching 0.75 with 0.25):
vPercentile75Profit =Fractile(aggr(sum({$<ProductProfitability = {'>0'} >} SaleAmount*ProductProfitability/100),CustID), 0.75)
If I understand well, this calculates a list of each customer profitability and then performs the 75 percentile of that list (which is a single value). This works great if I show the value in a Text box for example, however, if I use it in my table, it takes a different percentile for each customer (since CustID is in the dimension).
How can I bypass this? The percentiles must be the same for each customer, but I cannot find the way.
Thanks in advance, any help will be greatly appreciated!
Nothing works better to find the answer than asking your question to others. It was as simple as adding TOTAL to the variable definition:
vPercentile75Profit =Fractile(TOTAL aggr(sum({$<ProductProfitability = {'>0'} >} SaleAmount*ProductProfitability/100),CustID), 0.75)

How to calculate new value for field in Access

I've got a few problems with a database I have created.
I want to calculate a Total Price (Sandwich Quantity multiplied by Sandwich Price). I had it working before, but I had to delete Sandwich Price from the OrderDetailsT table of which it was originally in. I'm now having issues with this calculation, as I cannot make a calculation in the OrderDetailsT table (Sandwich Price isn't there).
How can I apply the Discount to the Total Price if the Total Price is more than $50 for instance? After the Discount has been applied to the Total Price field, I would also like to store it in the NewPriceAfterDiscount field.
Here is an image detailing my situation:
You have multiple questions in one:
But, first of all. As the image shows, why do you have a left join between OrderDetails an Sandwich? In a order calculation you don't need not ordered sandwiches.
To total price calculation:
Add a new column to the query grid (assuming discount is a percentaje stored has a number between 0 and 1):
[SandwichT].[SandwichPrice] * [OrderDetailT].[SandwichQuantity] * [OrderDetailT].[Discount]
To store total price: you can use the above formula, but using a update query.
If you plan to show the prices in a form or in a report:
you can do de calculations on the fly (and don't store the total
price)
or you should update the total price un one query and then build another
query as datasource of the form/report.
another posibility (my recomendation) is to store the total in the input form

Pivot Table Report

I am trying to create a pivot table with data from a SQL database. The pivot table is basically a process capability report that requires certain information. I have created an Excel file that sort of does what we are trying to accomplish, but I don't think my calculations are quite accurate. I know after doing some research that there are Pivot Tables within SQL but I don't know how to get them to work.
The table that my data is stored in has thousands of records. Each record has the following information: DATE, TIME, PRODUCT_NO, SEQ, DATECODE, DATECODE_IDH, PRODUCT, LINE, SHIFT, SIZE, OPERATOR, SAMPLE_SIZE, WEIGHT (1-12), WEIGHTXR, LSL_WT, TAR_WT, USL_WT, LABEL, MAV, LINE_SIZE
For the report, I need to group the data based on product and by line. Since the product isn't consistent, each product can be described by TAR_WT. So the grouping will be a combination of TAR_WT and LINE_SIZE. I need to count how many instances of that product were measured which will be the number of measurements (each individual weight which is 12 weights per record). I also need to find the minimum, maximum, and average of all of the weights per product (again 12 weights for every record). After those values are obtained, I have to calculate the Standard Deviation, Cp, and Tz of the values (staistical calculations) and report all the information.
Date Time Product No Seq DateCode Internal DateCode&ProductNo Product Description Line Size Weight1 Weight2 Weight3 Weight4 WeightXR LSL_WT TAR_WT USL_WT LABEL MAV
8/3/11 0:37:54 1234567 23 DateCode Internal DateCode&ProductNo Product Description L-1A 50 1575 1566 1569.5 1575.5 1573.4 1550.809 1574.623 1598.437 1564.623 1525.507 L-1A_50