BigQuery Scenario for usage - sql

I am working on one of the usecase where one part of scenario, I am not able to achieve , where I need to calculate new_tarif_allownce based on the mobile data usage for the prepaid mobile data of the subscriber.
subscriber_no
bill_start_date
bill_end_date
gift_given
gift_received
ROW_NO
Data_allowed
new_tarif_allownace
111
01-Jan-20
05-Jan-20
0
0
1
1000
1000
111
01-Jan-20
05-Jan-20
100
0
2
1000
900
111
01-Jan-20
05-Jan-20
0
0
3
1000
900
111
01-Jan-20
05-Jan-20
200
0
4
1000
700
111
01-Jan-20
05-Jan-20
0
50
5
1000
750
111
01-Jan-20
05-Jan-20
100
300
6
1000
950
111
01-Jan-20
05-Jan-20
0
700
7
1000
1650
222
01-feb-20
05-feb-20
0
0
1
2000
2000
222
01-feb-20
05-feb-20
100
0
2
2000
1900
Please find the details:
Given fields: subscriber_no, bill_start_date ,bill_end_date , gift_given ,gift_received , Data_allowed
Derived fields:
ROW_NO = row_number () over ( ) as ROW_NO
new_tarif_allownace = Data_allowed - gift_given + gift_received
Note to the point: If the columns have data as gift_given=0 or gift_received=0 that mean there is no calculation required for the column new_tarif_allownace, but we need to populate the previous/last calculated value as main in this rows
Logic I used to maintain the calculation with calculation but seems its not working:
case when last_gift_given<>0
Then min( TARIFF_ALLOWANCE_DATA_MB
- IFNULL(gift_given,0)
+ IFNULL(gift_received,0 ) ) over (order by ROW_NO ASC
rows unbounded preceding)
else max( TARIFF_ALLOWANCE_DATA_MB
- IFNULL(gift_given,0)
+ IFNULL(gift_received,0 ) ) over (order by ROW_NO ASC
rows unbounded preceding)
END as new_tarif_allownace_data,
Need the bigquery which do the calculation along with maintaing previous data If there is no change in
gift_given & gift_received columns.
I have attached a screenshot of the data as well for clear visibility.

The question is a little unclear to me, but I think you are probably looking for something like
data_allowed
- sum(gift_given) OVER (PARTITION BY subscriber_no ORDER BY row_no)
+ sum(gift_received) OVER (PARTITION BY subscriber_no ORDER BY row_no) new_tarif_allownace_data
You can see the basic idea in this Fiddle (not BigQuery, but should work the same).
I wouldn't worry about a special case for when one of the values is 0. The addition or subtraction of 0 won't hurt anything, so it seems safe to use. You can still wrap your values in ifnull if you want, but I don't think it's necessary when doing it this way.
Based on your current SQL, I think you might be trying to do this in a recursive CTE. It's certainly possible to do that if the above doesn't meet your needs, but it's not clear to me from your post whether/why that is necessary/preferred.

Related

SQL Server: How to retrieve all record based on recent datetime

First off, apologies if this has been asked elsewhere as I was unable to find any solution. The best I get is retrieving latest 1 record or 2-3 records. I'm more in search of all records (the number could be dynamic, could be 1 or 2 or maybe 50+) based on recent Datetime value. Well so basically here is the problem,
I have a table as follows,
APILoadDatetime
RowId
ProjectId
Value
2021-07-13 15:09:14.620
1
Proj-1
101
2021-07-13 15:09:14.620
2
Proj-2
81
2021-07-13 15:09:14.620
3
Proj-3
111
2021-07-13 15:09:14.620
4
Proj-4
125
2021-05-05 04:46:07.913
1
Proj-1
99
2021-05-05 04:46:07.913
2
Proj-2
69
2021-05-05 04:46:07.913
3
Proj-3
105
2021-05-05 04:46:07.913
4
Proj-4
115
...
...
...
...
What I am looking to do is, write up a query which will give me all the recent data based on Datetime, so in this case, I should get the following result,
APILoadDatetime
RowId
ProjectId
Value
2021-07-13 15:09:14.620
1
Proj-1
101
2021-07-13 15:09:14.620
2
Proj-2
81
2021-07-13 15:09:14.620
3
Proj-3
111
2021-07-13 15:09:14.620
4
Proj-4
125
The RowId shows (as the name suggests) gives the number of Rows for a particular Datetime block. This will not always be 4, it's dynamic based on the data received so could be 1,2,4 or even 50+ ...
Hope I was able to convey the question properly, Thank you all for reading and Pre-Thank you to those who provide solution to this.
you can use window function row_number to find out the latest entry for each projectid:
select * from (
select * , rank() over (order by APILoadDatetime desc) rn
from tablename
) t where rn = 1
select top 1 with ties
*
from
tablename
order by
row_number() over(
partition by RowId
order by APILoadDatetime desc
);
TOP 1 works with WITH TIES here.
WITH TIES means that when ORDER BY = 1, then SELECT takes this record (because of TOP 1) and all others that have ORDER BY = 1 (because of WITH TIES).
Update #1:
If you need the last record by APILoadDatetime and several records which might have the same APILoadDatetime (as the first found), then the query is simplier:
select top 1 with ties
*
from
tablename
order by
APILoadDatetime desc;

Computing rolling average and standard deviation by dates

I have the below table where I will need to compute the rolling average and standard deviation based on the dates. I have listed below the tables and expected results. I am trying to compute the rolling average for an id based on date. rollAvgA is computed based on metricA. For example, for the first occurrence of id for a particular date the result should return zero as it does not have any preceding values. Please let me know how this can be accomplished?
Current Table :
Date id metricA
8/1/2019 100 2
8/2/2019 100 3
8/3/2019 100 2
8/1/2019 101 2
8/2/2019 101 3
8/3/2019 101 2
8/4/2019 101 2
Expected Table :
Date id metricA rollAvgA
8/1/2019 100 2 0
8/2/2019 100 3 2.5
8/3/2019 100 2 2.3
8/1/2019 101 2 0
8/2/2019 101 3 2.5
8/3/2019 101 2 2.3
8/4/2019 101 2 2.25
You seem to want a cumulative average. This is basically:
select t.*,
avg(metricA * 1.0) over (partition by id order by date) as rollingavg
from t;
The only caveat is that the first value is an average of one value. To handle this, use a case expression:
select t.*,
(case when row_number() over (partition by id order by date) > 1
then avg(metricA * 1.0) over (partition by id order by date)
else 0
end) as rollingavg
from t;

Dividing a sum value into multiple rows due to field length constraint

I am migrating financial data from a very large table (100 million+ of rows) by summarizing the amount and insert them into summary table. I ran into problem when the summary amount (3 billions) is larger than what the field in the summary table can hold (can only hold up to 999 millions.) Changing the field size is not an option as it requires a change process.
The only option I have is to divide the amount (the one that breach the size limit) into smaller ones so it can be inserted into the table.
I came across this SQL - I need to divide a total value into multiple rows in another table which is similar except the number of rows I need to insert is dynamic.
For simplicity, this is how the source table might look like
account_table
acct_num | amt
-------------------------------
101 125.00
101 550.00
101 650.00
101 375.00
101 475.00
102 15.00
103 325.00
103 875.00
104 200.00
104 275.00
The summary records are as follows
select acct_num, sum(amt)
from account_table
group by acct_num
Account Summary
acct_num | amt
-------------------------------
101 2175.00
102 15.00
103 1200.00
104 475.00
Assuming the maximum value in the destination table is 1000.00, the expected output will be
summary_table
acct_num | amt
-------------------------------
101 1000.00
101 1000.00
101 175.00
102 15.00
103 1000.00
103 200.00
104 475.00
How do I create a query to get the expected result? Thanks in advance.
You need a numbers table. If you have a handful of values, you can define it manually. Otherwise, you might have one on hand or use a similar logic:
with n as (
select (rownum - 1) as n
from account_table
where rownum <= 10
),
a as (
select acct_num, sum(amt) as amt
from account_table
group by acct_num
)
select acct_num,
(case when (n.n + 1) * 1000 < amt then 1000
else amt - n.n * 1000
end) as amt
from a join
n
on n.n * 1000 < amt ;
A variation along these lines might give some ideas (using the 1,000 of your sample data):
WITH summary AS (
SELECT acct_num
,TRUNC(SUM(amt) / 1000) AS times
,MOD(SUM(amt), 1000) AS remainder
FROM account_table
GROUP BY acct_num
), x(acct_num, times, remainder) AS (
SELECT acct_num, times, remainder
FROM summary
UNION ALL
SELECT s.acct_num, x.times - 1, s.remainder
FROM summary s
,x
WHERE s.acct_num = x.acct_num
AND x.times > 0
)
SELECT acct_num
,CASE WHEN times = 0 THEN remainder ELSE 1000 END AS amt
FROM x
ORDER BY acct_num, amt DESC
The idea is to first build a summary table with div and modulo:
ACCT_NUM TIMES REMAINDER
101 2 175
102 0 15
103 1 200
104 0 475
Then perform a hierarchical query on the summary table based on the number of "times" (i.e. rows) you want, with an extra for the remainder.
ACCT_NUM AMT
101 1000
101 1000
101 175
102 15
103 1000
103 200
104 475

Window functions: PARTITION BY one column after ORDER BY another

Disclaimer: The shown problem is much more general than I expected first. The example below is taken from a solution to another question. But now I was taking this sample for solving many problems more - mostly related to time series (have a look at the "Linked" section in the right bar).
So I am trying to explain the problem more generally first:
I am using PostgreSQL but I am sure this problem exists in other window function supporting DBMS' (MS SQL Server, Oracle, ...) as well.
Window functions can be used to group certain values together by a common attribute or value. For example you can group rows by a date. Then you are able to calculate the max value within every single date or an average value or counting rows or whatever.
This can be achieved by defining a PARTITION. Grouping by dates would work with PARTITION BY date_column. Now you want to do an operation which needs a special order within your groups (calculating row numbers or sum up a column). This can be done with PARTITON BY date_column ORDER BY an_attribute_column.
Now think about a finer resolution of time series. What if you do not have dates but timestamps. Then you cannot group by the time column anymore. But nevertheless it might be important to analyse the data in the order they were added (maybe the timestamp is the creating time of your data set). Then you realize that some consecutive rows have the same value and you want to group your data by this common value. But the clue is that the rows have different timestamps.
The problem here is that you cannot do a PARTITION BY value_column. Because PARTITION BY forces an ordering first. So your table would be ordered by the value_column before the grouping and is not ordered by the timestamp anymore. This yields in results you are not expecting.
More general speaking: The problem is to ensure a special ordering even if the ordered column is not part of the created partition.
Example:
db<>fiddle
I have the following table:
ts val
100000 50
130100 30050
160100 60050
190200 100
220200 30100
250200 30100
300000 300
500000 100
550000 1000
600000 1000
650000 2000
700000 2000
720000 2000
750000 300
I had the problem that I had to group all tied values of the column val. But I wanted to hold the order by ts. To achieve this I wanted to add a column with a unique ID per val group
Expected result:
ts val group
100000 50 1
130100 30050 2
160100 60050 3
190200 100 4
220200 30100 5 \ same group
250200 30100 5 /
300000 300 6
500000 100 7
550000 1000 8 \ same group
600000 1000 8 /
650000 2000 9 \
700000 2000 9 | same group
720000 2000 9 /
750000 300 10
First try was the use of the rank window function which would do this job normally:
SELECT
*,
rank() OVER (PARTITION BY val ORDER BY ts)
FROM
test
But in this case this doesn't work because the PARTITION BY clause orders the table first by its partition columns (val in this case) and then by its ORDER BY columns. So the order is by val, ts instead of the expected order by ts. So the result was not the expected one of course.
ts val rank
100000 50 1
190200 100 1
500000 100 2
300000 300 1
750000 300 2
550000 1000 1
600000 1000 2
650000 2000 1
700000 2000 2
720000 2000 3
130100 30050 1
220200 30100 1
250200 30100 2
160100 60050 1
The question is: How to get the group ids with respect to the order by ts?
Edit: I added an own solution below but I feel very uncomfortable with it. It seems way too complicated. I was wondering if there's a better way to achieve this result.
I came up with this solution by myself (hoping someone else will get a better one):
demo:db<>fiddle
order by ts
give out the next val value with the lag window function (https://www.postgresql.org/docs/current/static/tutorial-window.html)
check if the next and the current values are the same. Then I can print out a 0 or a 1
sum up these values with an ordered SUM. This generates the groups I am looking for. They group the val column but ensure the ordering by the ts column.
The query:
SELECT
*,
SUM(is_diff) OVER (ORDER BY ts)
FROM (
SELECT
*,
CASE WHEN val = lag(val) over (order by ts) THEN 0 ELSE 1 END as is_diff
FROM test
)s
The result:
ts val is_diff sum
100000 50 1 1
130100 30050 1 2
160100 60050 1 3
190200 100 1 4
220200 30100 1 5 \ group
250200 30100 0 5 /
300000 300 1 6
500000 100 1 7
550000 1000 1 8 \ group
600000 1000 0 8 /
650000 2000 1 9 \
700000 2000 0 9 | group
720000 2000 0 9 /
750000 300 1 10

SQL Summation with more than one Condition

I have a table like this
Link PeriodiD Debit Credit Project
1 49 - 200 1
1 49 200 - 2
1 49 100 0
1 50 50 - 1
2 49 - 600 0
I want a script to sum the debit and credit per link per period disregarding project.
so the answer should look like
Link PeriodiD TotalDebit TotalCredit
1 49 300 200
1 50 50 -
2 49 - 600
i have more than 60 periodID and more than 100 link.
Please assist to make such a script
Use a Group by with aggregate functions.
SELECT Link,
PeriodID,
SUM(TotalDebit) AS TotalDebit,
SUM(TotalCredit) AS TotalCredit
FROM tablename
GROUP BY Link, PeriodId;
This query might not always give the expected result if you can have NULL values, depending on the DBMS that you use. You can modify it like this to account for this situation:
SELECT Link,
PeriodID,
SUM(COALESCE(TotalDebit,0)) AS TotalDebit,
SUM(COALESCE(TotalCredit,0)) AS TotalCredit
FROM tablename
GROUP BY Link, PeriodId;