BigQuery Calculating Battery State of Charge - sql

I'm trying to calculate the battery's State of Charge (SoC) given energy usage and PV energy production. I'm using BigQuery and can't solve the puzzle correctly.
When the energy is produced (negative value) can be stored in the batter hence it should increase the SoC of the battery. When the energy is consumed (positive value), it should reduce the SoC of the batter. The SoC of the batter can't go below 0 or above 200. See the table below.
Energy in/out
Battery SoC
4.57
0
2.5
0
0.29
0
-2.29
2.29
-6.00
8.29
3.29
5
6.65
0
-2.15
2.15
The "Energy in/out" is given (calculated based on other values). The SoC has to be a rolling value changing from row to row showing SoC at a given moment (15 mins intervals).
I tried using the following and few other functions with no luck.
SUM(energy_usage) OVER (ORDER BY usage_at)
Yet I can't incorporate the condition that SoC can't be less than 0 and more than 200.
The same setup and formula look in Excel as in the attached image.
How can this be represented in BigQuery query language?

You might consider below approach.
CREATE TEMP FUNCTION value_cap(v FLOAT64) AS (
LEAST(GREATEST(v, 0), 200)
);
WITH RECURSIVE sample_table AS (
SELECT * FROM UNNEST([4.57, 2.5, 0.29, -2.29, -6.00, 3.29, 6.65, -2.15]) in_out WITH offset AS usage_at
),
SoC AS (
SELECT *, value_cap(-in_out) soc FROM sample_table WHERE usage_at = 0
UNION ALL
SELECT t.in_out, t.usage_at, ROUND(value_cap(s.soc - t.in_out), 2)
FROM SoC s JOIN sample_table t ON s.usage_at + 1 = t.usage_at
)
SELECT * FROM SoC ORDER BY usage_at;
Query results

Related

Condition to SUM previous values - SQLite

My following SQLite database:
Explaining brazilian tax income:
a) if I have loss, I don't need to pay any tax (e.g.: January)
b) negative results can be subtracted from the next month positive outcome (e.g.: in February, instead of paying the full tax for $ 5942, the tax can be applied only to (5942 - 3200) = 2742.
c) if previous negative results are not sufficient to cover the next positive outcome, I got pay tax (e.g.: in September, I could compensate from June and July, but I had to aggregate from August (e.g.: total tax = -5000 -2185 +5000 +3000 = 815)
My goal would be build the following table:
I couldn't figure out a way to solve this problem. Any help?
Tks
You need to use recursive CTEs here. If you are not familiar with this feature you might check out my tutorial, the official documentation referenced in that tutorial, as well as any number of other tutorials available on the Internet.
First, I generate temporary row numbers using the row_number Window function in the source CTE block below (replace "RESULTS" with your table name).
Then I use recursive CTE (losses) to calculate residual loss from the previous months, which can be used to reduce your taxes. (This part might be tricky to understand if you are not familiar with recursive CTEs.) Finally, I calculate the total taxable amount adjusted for previous remaining loss if necessary.
WITH RECURSIVE
source AS (
SELECT row_number() OVER (ORDER BY ym) AS rid, *
FROM RESULTS
),
losses AS (
SELECT s.*, 0 AS res_loss
FROM source AS s
WHERE rid = 1
UNION ALL
SELECT s.*, iif(l.res_loss + l.profitloss < 0, l.res_loss + l.profitloss, 0) AS res_loss
FROM source AS s, losses AS l
WHERE s.rid = l.rid + 1
)
SELECT ym, profitloss, iif(profitloss + res_loss > 0, profitloss + res_loss, 0) AS tax
FROM losses
ORDER BY ym;

How to interpret STREAMING_TIMELINE_BY_* views in Google Big Query for Streaming Insert cost analysis

GBQ (Google Big Query) provides views for streaming insert meta data, see STREAMING_TIMELINE_BY_*. I would like to use this data to understand the billing for "Streaming Inserts". However, the numbers don't add ab and I'd like to understand if I made a mistake somewhere.
One of the data points in the streaming insert meta data view is the total_input_bytes:
total_input_bytes
INTEGER
Total number of bytes from all rows within the 1 minute interval.
In addition, the Pricing for data ingestion says:
Streaming inserts (tabledata.insertAll)
$0.010 per 200 MB
You are charged for rows that are successfully inserted. Individual rows are calculated using a 1 KB minimum size.
So getting the costs for streamining inserts per day should be possible via
0.01/200 * (SUM(total_input_bytes)/1024/1024)
cost per 200 mb -----^
total bytes in mb ---------------------^
This should be the lower boundary as we disregard any rows with less than 1KB that are rounded up to 1KB.
Full query:
SELECT
project_id,
dataset_id,
table_id,
SUM(total_rows) AS num_rows,
round(SUM(total_input_bytes)/1024/1024,2) AS num_bytes_in_mb,
# 0.01$ per 200MB
# #see https://cloud.google.com/bigquery/pricing#data_ingestion_pricing
round(0.01*(SUM(total_input_bytes)/1024/1024)/200, 2) AS cost_in_dollar,
SUM(total_requests) AS num_requests
FROM
`region-us`.INFORMATION_SCHEMA.STREAMING_TIMELINE_BY_PROJECT
where
start_timestamp BETWEEN "2021-04-10" and "2021-04-14"
AND error_code IS NULL
GROUP BY 1, 2, 3
ORDER BY table_id asc
However, the results are not reflected in our actual billing report. The billing shows less than half the costs of what I would expect:
Now I wondering if the costs can even be calculated like this.
Your query is rounding every line that is less than 0.49kb to 0kb. This should explain why you are calculating less costs.
Try inserting a CASE statement that will handle these values:
SELECT
project_id,
dataset_id,
table_id,
SUM(total_rows) AS num_rows,
CASE SUM(total_input_bytes)/1024/1024 < 0.001 THEN 0.001 ELSE
round(SUM(total_input_bytes)/1024/1024,2) END AS num_bytes_in_mb,
# 0.01$ per 200MB
# #see https://cloud.google.com/bigquery/pricing#data_ingestion_pricing
CASE SUM(total_input_bytes)/1024/1024 < 0.001 THEN 0.001 ELSE
round(0.01*(SUM(total_input_bytes)/1024/1024)/200, 2) END AS cost_in_dollar,
SUM(total_requests) AS num_requests
FROM
`region-us`.INFORMATION_SCHEMA.STREAMING_TIMELINE_BY_PROJECT
where
start_timestamp BETWEEN "2021-04-10" and "2021-04-14"
AND error IS NULL
GROUP BY 1, 2, 3
ORDER BY table_id asc

DAX weighted average based on another dimension

I have the following data:
DATE COUNTRY ITEM Value
2005-01-01 UK op_rate 30%
2005-01-01 UK proc 1000
2005-01-01 UK export 750
2005-01-01 ITA op_rate 45%
2005-01-01 ITA proc 500
2005-01-01 ITA export 350
Basically, data in normal format, which includes both ratios (the op_rate) and other items such as exported volumes and processed volumes ("proc").
I need to aggregate by SUM for "proc" and "export", but not for the "op_rate", for which I need a weighted average by "proc".
In this case the aggregated op_rate would be:
0.45*500 + 0.30*1000 = 0.35 // instead of a .75 SUM or 0.375 AVERAGE
All example I find for weighted average are across measures, but none covers using other dimensions.
Any help most welcome!
I understand that you are reluctant to change your model. The problem you have here is that you are trying to consume a highly normalised table and use it for analysis using an OLAP tool. OLAP tools prefer Fact/Dim star schemas and Tabular/PowerBI is no different. I suspect that this is going to continue to problems with future requirements too. Taking the hit on changing the structure now is the best time to do it as it will get more difficult the longer you leave it.
This isn't to say that you can't do what you want using the tools, but the resulting dax will be less efficient and the storage required will be sub-optimal.
So with that caveat/lecture given (!) here is how you can do it.
op_rate_agg =
VAR pivoted =
ADDCOLUMNS (
SUMMARIZE ( 'Query1', Query1[COUNTRY], Query1[DATE] ),
"op_rate", CALCULATE ( AVERAGE ( Query1[Value] ), Query1[ITEM] = "op_rate" ),
"proc", CALCULATE ( SUM ( Query1[Value] ), Query1[ITEM] = "proc" )
)
RETURN
DIVIDE ( SUMX ( pivoted, [op_rate] * [proc] ), SUMX ( pivoted, [proc] ) )
It is really inefficient as it is having to build your pivoted set every execution and you will see that the query plan is having to do a lot more work than it would if you persisted this as a proper Fact table. If your model is large you will likely have performance issues with this measure and any that references it.
#RADO is correct. You should definitely pivot your ITEM column to get this.
Then a weighted average on op_rate can be written simply as
= DIVIDE(
SUMX(Table3, Table3[op_rate] * Table3[proc]),
SUMX(Table3, Table3[proc]))

SQL Percercentile Calculation

I have the following query, which even without a ton of data (~3k rows) is still a bit slow to execute, and the logic is a bit over my head - was hoping to get some help optimizing the query or even an alternate methodology:
Select companypartnumber, (PartTotal + IsNull(Cum_Lower_Ranks, 0) ) / Sum(PartTotal) over() * 100 as Cum_PC_Of_Total
FROM PartSalesRankings PSRMain
Left join
(
Select PSRTop.Item_Rank, Sum(PSRBelow.PartTotal) as Cum_Lower_Ranks
from partSalesRankings PSRTop
Left join PartSalesRankings PSRBelow on PSRBelow.Item_Rank < PSRTop.Item_Rank
Group by PSRTop.Item_Rank
) as PSRLowerCums on PSRLowerCums.Item_Rank = PSRMain.Item_Rank
The PartSalesRankings table simply consists of CompanyPartNumber(bigint) which is a part number designation, PartTotal(decimal 38,5) which is the total sales, and Item_Rank(bigint) which is the rank of the item based on total sales.
I'm trying to end up with my parts into categories based on their percentile - so an "A" item would be top 5%, a "B" item would be the next 15%, and "C" items would be the lower 80th percentile. The view I created works fine, it just takes almost three seconds to execute, which for my purposes is quite slow. I narrowed the bottle neck to the above query - any help would be greatly appreciated.
The problem you are having is the calculation of the cumulative sum of PartTotal. If you are using SQL Server 2012, you can do something like:
select (case when ratio <= 0.05 then 'A'
when ratio <= 0.20 then 'B'
else 'C'
end),
t.*
from (select psr.companypartnumber,
(sum(PartTotal) over (order by PartTotal) * 1.0 / sum(PartTotal) over ()) as ratio
FROM PartSalesRankings psr
) t
SQL Server 2012 also have percentile functions and other functions not in earlier versions.
In earlier versions, the question is how to get the cumulative sum efficiently. Your query is probably as good as anything that can be done in one query. Can the cumulative sum be calculated when partSalesRankings is created? Can you use temporary tables?

MySQL - Max() return wrong result

I tried this query on MySQL server (5.1.41)...
SELECT max(volume), dateofclose, symbol, volume, close, market FROM daily group by market
I got this result:
max(volume) dateofclose symbol volume close market
287031500 2010-07-20 AA.P 500 66.41 AMEX
242233000 2010-07-20 AACC 16200 3.98 NASDAQ
1073538000 2010-07-20 A 4361000 27.52 NYSE
2147483647 2010-07-20 AAAE.OB 400 0.01 OTCBB
437462400 2010-07-20 AAB.TO 31400 0.37 TSX
61106320 2010-07-20 AA.V 0 0.24 TSXV
As you can see, the maximum volume is VERY different from the 'real' value of the volume column?!?
The volume column is define as int(11) and I got 2 million rows in this table but it's very far from the max of MyISAM storage so I cannot believed this is the problem!? What is also strange is data get show from the same date (dateofclose). If I force a specific date with a WHERE clause, the same symbol came out with different max(volume) result. This is pretty weird...
Need some help here!
UPDATE :
Here's my edited "working" request:
SELECT a.* FROM daily a
INNER JOIN (
SELECT market, MAX(volume) AS max_volume
FROM daily
WHERE dateofclose = '20101108'
GROUP BY market
) b ON
a.market = b.market AND
a.volume = b.max_volume
So this give me, by market, the highest volume's stock (for nov 8, 2010).
As you can see, the maximum volume is VERY different from the 'real' value of the volume column?!?
This is because MySQL rather bizarrely doesn't GROUP things in a sensical way.
Selecting MAX(column) will get you the maximum value for that column, but selecting other columns (or column itself) will not necessarily select the entire row that the found MAX() value is in. You essentially get an arbitrary (and usually useless) row back.
Here's a thread with some workarounds using subqueries:
How can I SELECT rows with MAX(Column value), DISTINCT by another column in SQL?
This is a subset of the "greatest n per group" problem. (There is a tag with that name but I am a new user so I can't retag).
This is usually best handled with an analytic function, but can also be written with a join to a sub-query using the same table. In the sub-query you identify the max value, then join to the original table on the keys to find the row that matches the max.
Assuming that {dateofclose, symbol, market} is the grain at which you want the maximum volume, try:
select
a.*, b.max_volume
from daily a
join
(
select
dateofclose, symbol, market, max(volume) as max_volume
from daily
group by
dateofclose, symbol, market
) b
on
a.dateofclose = b.dateofclose
and a.symbol = b.symbol
and a.market = b.market
Also see this post for reference.
Did you try adjusting your query to include Symbol in the grouping?
SELECT max(volume), dateofclose, symbol,
volume, close, market FROM daily group by market, symbol