Big Query Error When Using CAST and determining decimals - sql

I have linked a Big Query Project to my Google Ads Account. Within that, we have a campaignBasicStats table.
I want to pull the cost of a campaign from my Google Ads account into a big query workspace to apply some additional logic.
The cost column is coming through as an INTEGER and is described like this:
INTEGER NULLABLE
The sum of your cost-per-click (CPC) and cost-per-thousand impressions (CPM) costs during this period. Values can be one of: a) a money amount in micros, b) "auto: x" or "auto" if this field is a bid and AdWords is automatically setting the bid via the chosen bidding strategy, or c) "--" if this field is a bid and no bid applies to the row.
If I query the table, the cost returns in this value: Example:
2590000.0
965145.0
In Google Ads, the two costs for these campaigns are £25.90 and £96.51
So I have this code in my Big Query Workspace.
SELECT CAST(Cost AS FLOAT64)
FROM `db_table`
WHERE COST > 0
LIMIT 1000
The column returns these numbers:
2590000.0
965145.0
However, As I need the numbers to be a currency for example the first return 2590000.0 should be 25.90 and the second one should be 96.51
I changed my code to this:
SELECT CAST(Cost AS FLOAT64(4,2))
FROM `db_table`
WHERE COST > 0
LIMIT 1000
And now I get this error:
FLOAT64 does not support type parameters at [1:28]
Is there something I'm missing? how do I convert to decimal point and specify where I want the decimal point to be in BQ?
Thanks,

It appears you are using a Google Ads Data Transfer operation as detailed here.
In this case, it's important to note the Description of the Cost column in p_CampaignBasicStats:
The sum of your cost-per-click (CPC) and cost-per-thousand impressions
(CPM) costs during this period. Values can be one of: a) a money
amount in micros, b) "auto: x" or "auto" if this field is a bid and
AdWords is automatically setting the bid via the chosen bidding
strategy, or c) "--" if this field is a bid and no bid applies to the
row.
1 micro is 1-millionth of the fundamental currency. Thus, we need to transform this amount as such: cost / 1000000
Then, we simply need to ROUND to get the appropriate unit. If you prefer to always round up, see my answer regarding the correct way to do that here.
First, we'll set up an example table with the example values you've given:
CREATE TEMP TABLE ex_db_table ( Cost INTEGER );
INSERT INTO
ex_db_table
VALUES
( 2590000 );
INSERT INTO
ex_db_table
VALUES
( 965145 );
Then we'll select the data in your preferred unit:
SELECT
ROUND(Cost / 1000000, 2) as currency_cost
FROM
ex_db_table;
Of note, your math in your question is incorrect here as the actual values of your Cost examples equate to 2.59 and 0.97.

Related

How to sum just the decimal values from a string in SQL Server?

I'm working with a system that sells products using Kiosks, these Kiosks receives both bills and coins. The payments made in the Kiosks are registered in the database like this:
Product cost $5
Customer inserts two bills of $2 and the rest in coins like 1 coin of 0.50, 2 coins of 0.20 and 1 coin of 0.10.
In the database it show this in a nvarchar value:
(2,2,0.50,0.20,0.20,0.10)
It is a sum of all the bills and coins inserted in the Kiosk in just one field.
I need to get and sum only the coins from this field. How do I do that in SQL Server?
I know that it would be easier if the table divides the bills from coins but I have control over how these records are created.
The table definition is:
KioskID INT
Payments nvarchar(250)
PaymentCommited INT
In the example I gave above I want to sum all the coins and the expected value for field Payments would it be 1.
I would be grateful for any help.
Issues with your data aside, at a guess you want to sum all the decimal values in the provided sample string.
You could use string_split to get each value as a row and then sum:
declare #dodgydata varchar(100)='2,2,0.50,0.20,0.20,0.10'
select Sum(Try_Convert(decimal(5,2),value)) as CoinTotal
from (select #dodgydata dd)d
cross apply String_Split(dd,',')
where value like '%.%';
Note of course this can only work if the coin denominations don't co-exist with note denominations, otherwise with data of '1,1' is that two notes or two coins or one of each?

How to calculate a bank's deposit growth from one call report to the next, as a percentage?

I downloaded the entire FDIC bank call reports dataset, and uploaded it to BigQuery.
The table I currently have looks like this:
What I am trying to accomplish is adding a column showing the deposit growth rate since the last quarter for each bank:
Note:The first reporting date for each bank (e.g. 19921231) will not have a "Quarterly Deposit Growth". Hence the two empty cells for the two banks.
I would like to know if a bank is increasing or decreasing its deposits each quarter/call report (viewed as a percentage).
e.g. "On their last call report (19921231)First National Bank had deposits of 456789 (in 1000's). In their next call report (19930331)First National bank had deposits of 567890 (in 1000's). What is the percentage increase (or decrease) in deposits"?
This "_%_Change_in_Deposits" column would be displayed as a new column.
This is the code I have written so far:
select
SFRNLL.repdte, SFRNLL.cert, SFRNLL.name, SFRNLL.city, SFRNLL.county, SFRNLL.stalp, SFRNLL.specgrp AS `Loan_Specialization`, SFRNLL.lnreres as `_1_to_4_Residential_Loans`, AL.dep as `Deposits`, AL.lnlsnet as `loans_and_leases`,
IEEE_DIVIDE(SFRNLL.lnreres, AL.lnlsnet) as SFR2TotalLoanRatio
FROM usa_fdic_call_reports_1992.All_Reports_19921231_1_4_Family_Residential_Net_Loans_and_Leases as SFRNLL
JOIN usa_fdic_call_reports_1992.All_Reports_19921231_Assets_and_Liabilities as AL
ON SFRNLL.cert = AL.cert
where SFRNLL.specgrp = 4 and IEEE_DIVIDE(SFRNLL.lnreres, AL.lnlsnet) <= 0.10
UNION ALL
select
SFRNLL.repdte, SFRNLL.cert, SFRNLL.name, SFRNLL.city, SFRNLL.county, SFRNLL.stalp, SFRNLL.specgrp AS `Loan_Specialization`, SFRNLL.lnreres as `_1_to_4_Residential_Loans`, AL.dep as `Deposits`, AL.lnlsnet as `loans_and_leases`,
IEEE_DIVIDE(SFRNLL.lnreres, AL.lnlsnet) as SFR2TotalLoanRatio
FROM usa_fdic_call_reports_1993.All_Reports_19930331_1_4_Family_Residential_Net_Loans_and_Leases as SFRNLL
JOIN usa_fdic_call_reports_1993.All_Reports_19930331_Assets_and_Liabilities as AL
ON SFRNLL.cert = AL.cert
where SFRNLL.specgrp = 4 and IEEE_DIVIDE(SFRNLL.lnreres, AL.lnlsnet) <= 0.10
The table looks like this:
Additional notes:
I would also like to view the last column (SFR2TotalLoansRatio) as a percentage.
This code runs correctly, however, previously I was getting a "division by zero" error when attempting to run 50,000 rows (1992 to the present).
Addressing each of your question individually.
First) Retrieving SFR2TotalLoanRatio as percentage, I assume you want to see 9.988% instead of 0.0988 in your results. Currently, in BigQuery you can achieve this by casting the field into a STRING then, concatenating the % sign. Below there is an example with sample data:
WITH data as (
SELECT 0.0123 as percentage UNION ALL
SELECT 0.0999 as percentage UNION ALL
SELECT 0.3456 as percentage
)
SELECT CONCAT(CAST(percentage*100 as String),"%") as formatted_percentage FROM data
And the output,
Row formatted_percentage
1 1.23%
2 9.99%
3 34.56%
Second) Regarding your question about the division by zero error. I am assuming IEEE_DIVIDE(arg1,arg2) is a function to perform the division, in which arg1 is the divisor and arg2 is the dividend. Therefore, I would adivse your to explore your data in order to figured out which records have divisor equals to zero. After gathering these results, you can determine what to do with them. In case you decide to discard them you can simply add within your WHERE statement in each of your JOINs: AL.lnlsnet = 0. On the other hand, you can also modify the records where lnlsnet = 0 using a CASE WHEN or IF statements.
UPDATE:
In order to add this piece of code your query, you u have to wrap your code within a temporary table. Then, I will make two adjustments, first a temporary function in order to calculate the percentage and format it with the % sign. Second, retrieving the previous number of deposits to calculate the desired percentage. I am also assuming that cert is the individual id for each of the bank's clients. The modifications will be as follows:
#the following function MUST be the first thing within your query
CREATE TEMP FUNCTION percent(dep INT64, prev_dep INT64) AS (
Concat(Cast((dep-prev_dep)/prev_dep*100 AS STRING), "%")
);
#followed by the query you have created so far as a temporary table, notice the the comma I added after the last parentheses
WITH data AS(
#your query
),
#within this second part you need to select all the columns from data, and LAG function will be used to retrieve the previous number of deposits for each client
data_2 as (
SELECT repdte, cert, name, city, county, stalp, Loan_Specialization, _1_to_4_Residential_Loans,Deposits, loans_and_leases, SFR2TotalLoanRatio,
CASE WHEN cert = lag(cert) OVER (PARTITION BY id ORDER BY d) THEN lag(Deposits) OVER (PARTITION BY id ORDER BY id) ELSE NULL END AS prev_dep FROM data
)
SELECT repdte, cert, name, city, county, stalp, Loan_Specialization, _1_to_4_Residential_Loans,Deposits, loans_and_leases, SFR2TotalLoanRatio, percent(Deposits,prev_dep) as dept_growth_rate FROM data_2
Note that the built-in function LAG is used together with CASE WHEN in order to retrieve the previous amount of deposits per client.

"Running Product" aggregate/ windowed function in PostgreSql?

I am trying to normalize End-of-Day stock prices in PostgreSql.
Let's say I have a stock table defined as such:
create table eod (
date date not null,
stock_id int not null,
split decimal(16,8) not null,
close decimal(12,6) not null,
constraint pk_eod primary key (date, stock_id)
);
Data in this table might look like this:
"date","stock_id","eod_split","close"
"2014-06-13",14010920,"1.00000000","182.560000"
"2014-06-13",14010911,"1.00000000","91.280000"
"2014-06-13",14010923,"1.00000000","41.230000"
"2014-06-12",14010911,"1.00000000","92.290000"
"2014-06-12",14010920,"1.00000000","181.220000"
"2014-06-12",14010923,"1.00000000","40.580000"
"2014-06-11",14010920,"1.00000000","182.250000"
"2014-06-11",14010911,"1.00000000","93.860000"
"2014-06-11",14010923,"1.00000000","40.860000"
"2014-06-10",14010911,"1.00000000","94.250000"
"2014-06-10",14010923,"1.00000000","41.110000"
"2014-06-10",14010920,"1.00000000","184.290000"
"2014-06-09",14010920,"1.00000000","186.220000"
"2014-06-09",14010911,"7.00000000","93.700000"
"2014-06-09",14010923,"1.00000000","41.270000"
"2014-06-06",14010923,"1.00000000","41.480000"
"2014-06-06",14010911,"1.00000000","645.570000"
"2014-06-06",14010920,"1.00000000","186.370000"
"2014-06-05",14010920,"1.00000000","185.980000"
"2014-06-05",14010911,"1.00000000","647.350000"
"2014-06-05",14010923,"1.00000000","41.210000"
...
"2005-03-04",14010920,"1.00000000","92.370000"
"2005-03-04",14010911,"1.00000000","42.810000"
"2005-03-04",14010923,"1.00000000","25.170000"
"2005-03-03",14010923,"1.00000000","25.170000"
"2005-03-03",14010911,"1.00000000","41.790000"
"2005-03-03",14010920,"1.00000000","92.410000"
"2005-03-02",14010920,"1.00000000","92.920000"
"2005-03-02",14010923,"1.00000000","25.260000"
"2005-03-02",14010911,"1.00000000","44.121000"
"2005-03-01",14010920,"1.00000000","93.300000"
"2005-03-01",14010923,"1.00000000","25.280000"
"2005-03-01",14010911,"1.00000000","44.500000"
"2005-02-28",14010923,"1.00000000","25.160000"
"2005-02-28",14010911,"2.00000000","44.860000"
"2005-02-28",14010920,"1.00000000","92.580000"
"2005-02-25",14010923,"1.00000000","25.250000"
"2005-02-25",14010920,"1.00000000","92.800000"
"2005-02-25",14010911,"1.00000000","88.990000"
"2005-02-24",14010923,"1.00000000","25.370000"
"2005-02-24",14010920,"1.00000000","92.640000"
"2005-02-24",14010911,"1.00000000","88.930000"
"2005-02-23",14010923,"1.00000000","25.200000"
"2005-02-23",14010911,"1.00000000","88.230000"
"2005-02-23",14010920,"1.00000000","92.100000"
...
"2003-02-24",14010920,"1.00000000","78.560000"
"2003-02-24",14010911,"1.00000000","14.740000"
"2003-02-24",14010923,"1.00000000","24.070000"
"2003-02-21",14010920,"1.00000000","79.950000"
"2003-02-21",14010923,"1.00000000","24.630000"
"2003-02-21",14010911,"1.00000000","15.000000"
"2003-02-20",14010911,"1.00000000","14.770000"
"2003-02-20",14010920,"1.00000000","79.150000"
"2003-02-20",14010923,"1.00000000","24.140000"
"2003-02-19",14010920,"1.00000000","79.510000"
"2003-02-19",14010911,"1.00000000","14.850000"
"2003-02-19",14010923,"1.00000000","24.530000"
"2003-02-18",14010923,"2.00000000","24.960000"
"2003-02-18",14010911,"1.00000000","15.270000"
"2003-02-18",14010920,"1.00000000","79.330000"
"2003-02-14",14010911,"1.00000000","14.670000"
"2003-02-14",14010920,"1.00000000","77.450000"
"2003-02-14",14010923,"1.00000000","48.300000"
"2003-02-13",14010920,"1.00000000","75.860000"
"2003-02-13",14010911,"1.00000000","14.540000"
"2003-02-13",14010923,"1.00000000","46.990000"
Note the "split" column. When a split value other than 1 is recorded, it basically means that the stock shares split by that factor. IOW, when the split is 2.0, the number of the outstanding shares doubled, but the value of each individual share is halved from that point on. If the stock was worth $100 per share, it's now worth $50 per share.
If you graph this with raw numbers, this sort of thing is truly ugly. Sharp cliffs show up, when the overall value of the company did not significantly change... and when you have multiple splits, you end up with a graph that does not properly reflect the trending of the company, often by a large margin. In the above example, where there was a 2:1 split, your close prices for a stock would look something like 100, 100, 100, 50, 50, 50.
I want to use this table to create a "normalized" price, in a reasonably efficient manner (there's quite a few records to chunk through). Continuing the sample, this would show the stock prices at 50, 50, 50, 50, 50, 50. If there were multiple splits, the data should still be consistent and smooth, if we ignored actual market value changes.
My idea is, if I can create a CTE of a "running product" aggregate of the split value, going back in time, I can define date ranges per stock and what the modifier value to apply to the closing cost should be, then join that back to the eod table and select into a new table the adjusted close value for each stock.
...the problem is, I cannot wrap my head around how to do that in anything other than a whole bunch of temp tables and multi-step processes. I do not know of any built-in functionality to make this easier, either.
Can someone show me how I can generate the normalized data?
You don't need a CTE. You just need a cumulative product. Postgres doesn't have one built in. But, arithmetic to the rescue!
select eod.*,
exp(sum(ln(eod_split)) over (partition by stock_id order by date)) as cume_split,
(close *
exp(sum(ln(eod_split)) over (partition by stock_id order by date))
) as normalized_price
from eod;
Hilarious, looking for this solution, I find that an associate already asked about it. Here is the basic algebra behind this ingenious solution: https://blog.prepscholar.com/natural-log-rules

Select Average of Top 25% of Values in SQL

I'm currently writing a stored procedure for my client to populate some tables that will be used to generate SSRS reports later on. Some of the data is based on specific stock formulas that are run on each of their clients' quarterly data (sent to them by their clients). The other part of the data is generated by comparing those results against those from other, similar sized clients. One of the things that they want tracked in their reports is the average of the top 25% of formula results for that particular comparison group.
To give a better picture of it, imagine the following fields that I have in a temp table:
FormulaID int
Value decimal (18,6)
I want to do the following: Given a specific FormulaID return the average of the top 25% of Value.
I know how to take an average in SQL, but I don't know how to do it against only the top 25% of a specific group.
How would I write this query?
I guess you can do something like this...
SELECT AVG(Q.ColA) Avg25Prec
FROM (
SELECT TOP 25 Percent ColA
FROM Table_Name
ORDER BY SomeCOlumn
) Q
Here's what I did, given the table shown above:
select AVG(t.Value)
from (select top 25 percent Value
from #TempGroupTable
where FormulaID = #PassedInFormulaID
order by Value desc) as t
The desc must be there, because the percent command will not actually do comparisons. It will just simply grab the first x number of records, with x being equal to 25% of the count of records it's querying. Therefore, the order by Value desc line then will grab the top 25% records which have the highest Value, and then sends that info to be averaged.
As a side note to all of this, this also means that if you wanted to grab the bottom 25% instead, or if your formula results are like a golf score (i.e. lowest is the best), all you would need to do is remove the desc part and you would be good to go.

Calculating a Ratio using Column A & Column B - in Powerpivot/MDX/DAX, not in SQL

I have a query to pull clickthrough for a funnel, where if a user hit a page it records as "1", else NULL --
SELECT datestamp
,COUNT(visits) as Visits
,count([QE001]) as firstcount
,count([QE002]) as secondcount
,count([QE004]) as thirdcount
,count([QE006]) as finalcount
,user_type
,user_loc
FROM
dbname.dbo.loggingtable
GROUP BY user_type, user_loc
I want to have a column for each ratio, e.g. firstcount/Visits, secondcount/firstcount, etc. as well as a total (finalcount/Visits).
I know this can be done
in an Excel PivotTable by adding a "calculated field"
in SQL by grouping
in PowerPivot by adding a CalculatedColumn, e.g.
=IFERROR(QueryName[finalcount]/QueryName[Visits],0)
BUT I need give the report consumer the option of slicing by just user_type or just user_loc, etc, and excel will tend to ADD the proportions, which won't work b/c
SUM(A/B) != SUM(A)/SUM(B)
Is there a way in DAX/MDX/PowerPivot to add a calculated column/measure, so that it will be calculated as SUM(finalcount)/SUM(Visits), for any user-defined subset of the data (daterange, user type, location, etc.)?
Yes, via calculated measures. calculated columns are for creating values that you want to see on rows/columns/report header...calculated measures are for creating values that you want to see in the values section of a pivot table and can slice/dice by the columns in the model.
The easiest way would be to create 3 calculated "measures" in the calculation area of the powerpivot sheet.
TotalVisits:=SUM(QueryName[visits])
TotalFinalCount:=SUM(QueryName[finalcount])
TotalFinalCount2VisitsRatio:=[TotalFinalCount]/[TotalVisits]
You can then slice the calculated measure [TotalFinalCount2VisitsRatio] by user_type or just user_loc (or whatever) and the value will be calculated correctly. The difference here is that you are explicitly telling the xVelocity engine to SUM-then-DIVIDE. If you create the calculated column, then the engine thinks you want to DIVIDE-then-SUM.
Also, you don't have to break down the measure into 3 separate measures...it's just good practice. If you're interested in learning more, I'd recommend this book...the author is the PowerPivot/DAX guru and the book is very straightforward.