BigQuery RATIO_TO_REPORT for all data no partition - sql

I want calculate ratio of specify field, I know in legacy sql I can use RATIO_TO_REPORT function ex:
SELECT
month,
RATIO_TO_REPORT(totalPoint) over (partition by month)
FROM (
SELECT
format_datetime('%Y-%m', ts) AS month,
SUM(point) AS totalPoint
FROM
`userPurchase`
GROUP BY
month
ORDER BY
month )
but I want get ratio that calculate by all data without partition, ex:(this code not work)
SELECT
month,
RATIO_TO_REPORT(totalPoint) over (partition by "all"),
# RATIO_TO_REPORT(totalPoint) over (partition by null)
FROM (
SELECT
format_datetime('%Y-%m', ts) AS month,
SUM(point) AS totalPoint
FROM
`userPurchase`
GROUP BY
month
ORDER BY
month )
It doesn't work, How I can do for same thing? thanks!

assuming the rest of the code is correct - just omit partition by part
RATIO_TO_REPORT(totalPoint) OVER ()

Related

How to return max date per month for user

I have following table:
And I would like to have returned maximum threshold date per each month for every user, so my final result should look like that:
I wanted to use analytic function ROW_NUMBER and return maximum number of row but how to do it per month for each user? Is there any simpler way to do it in BigQuery?
You can partition the row_number by the user and the month, and then take the first one for each:
SELECT user_id, threshold_date, net_deposists_usd
FROM (SELECT user_id, threshold_date, net_deposists_usd,
ROW_NUMBER () OVER (PARTITION BY user_id, EXTRACT (MONTH FROM threshold_date)
ORDER BY net_deposists_usd DESC) AS rk
FROM mytable)
WHERE rk = 1
BigQuery now supports qualify, which does everything you want. For the month, just use date_trunc():
select t.*
from t
qualify row_number() over (partition by user_id, date_trunc(threshold_date, month)
order by threshold_date desc, net_deposits_usd desc
);
A simple alternative uses arrays and group by:
select array_agg(t order by threshold_date desc, net_deposits_usd desc limit 1)[ordinal(1)].*
from t
group by user_id, date_trunc(threshold_date, month) ;

ORACLE SQL: Find last minimum and maximum consecutive period

I have the sample data set below which list the water meters not working for specific reason for a certain range period (jan 2016 to december 2018).
I would like to have a query that retrieves the last maximum and minimum consecutive period where the meter was not working within that range of period.
any help will be greatly appreciated.
You have two options:
select code, to_char(min_period, 'yyyymm') min_period, to_char(max_period, 'yyyymm') max_period
from (
select code, min(period) min_period, max(period) max_period,
max(min(period)) over (partition by code) max_min_period
from (
select code, period, sum(flag) over (partition by code order by period) grp
from (
select code, period,
case when add_months(period, -1)
= lag(period) over (partition by code order by period)
then 0 else 1 end flag
from (select mrdg_acc_code code, to_date(mrdg_per_period, 'yyyymm') period from t)))
group by code, grp)
where min_period = max_min_period
Explanation:
flag rows where period is not equal previous period plus one month,
create column grp which sums flags consecutively,
group data using code and grp additionaly finding maximal start of period,
show only rows where min_period = max_min_period
Second option is recursive CTE available in Oracle 11g and above:
with
data(period, code) as (
select to_date(mrdg_per_period, 'yyyymm'), mrdg_acc_code from t
where mrdg_per_period between 201601 and 201812),
cte (period, code) as (
select to_char(period, 'yyyymm'), code from data
where (period, code) in (select max(period), code from data group by code)
union all
select to_char(data.period, 'yyyymm'), cte.code
from cte
join data on data.code = cte.code
and data.period = add_months(to_date(cte.period, 'yyyymm'), -1))
select code, min(period) min_period, max(period) max_period
from cte group by code
Explanation:
subquery data filters only rows from 2016 - 2018 additionaly converting period to date format. We need this for function add_months to work.
cte is recursive. Anchor finds starting rows, these with maximum period for each code. After union all is recursive member, which looks for the row one month older than current. If it finds it then net row, if not then stop.
final select groups data. Notice that period which were not consecutive were rejected by cte.
Though recursive queries are slower than traditional ones, there can be scenarios where second solution is better.
Here is the dbfiddle demo for both queries. Good luck.
use aggregate function with group by
select max(mdrg_per_period) mdrg_per_period, mrdg_acc_code,max(mrdg_date_read),rea_Desc,min(mdrg_per_period) not_working_as_from
from tablename
group by mrdg_acc_code,rea_Desc
This is a bit tricky. This is a gap-and-islands problem. To get all continuous periods, it will help if you have an enumeration of months. So, convert the period to a number of months and then subtract a sequence generated using row_number(). The difference is constant for a group of adjacent months.
This looks like:
select acc_code, min(period), max(period)
from (select t.*,
row_number() over (partition by acc_code order by period_num) as seqnum
from (select t.*, floor(period / 100) * 12 + mod(period, 100) as period_num
from t
) t
where rea_desc = 'METER NOT WORKING'
) t
group by (period_num - seqnum);
Then, if you want the last one for each account, you can use a subquery:
select t.*
from (select acc_code, min(period), max(period),
row_number() over (partition by acc_code order by max(period desc) as seqnum
from (select t.*,
row_number() over (partition by acc_code order by period_num) as seqnum
from (select t.*, floor(period / 100) * 12 + mod(period, 100) as period_num
from t
) t
where rea_desc = 'METER NOT WORKING'
) t
group by (period_num - seqnum)
) t
where seqnum = 1;

Running total in per year ordered by person based on latest date info

We try to calculate the running total in for each year ordered by person based on his latest date info. So i got an example for you how the data is ordered:
Expected result:
So for each downloaded date we want to running total in of all persons ordered by year (now the year is only 2018)
What do we have so far:
sum(Amount)
over(partition by [Year],[Person]
order by [Enddate)
where max(Downloaded)
Any idea how to fix this?
Just use window function
select *,
sum(Amount) over (partition by Year, Downloaded) RuningTotal
from table t
Try using a subquery with a moving downloaded date range.
SELECT
T.*,
RunningTotalByDate = (
SELECT
SUM(N.Amount)
FROM
YourTable AS N
WHERE
N.Downloaded <= T.Downloaded)
FROM
YourTable AS T
ORDER BY
T.Downloaded ASC,
T.Person ASC
Or with windowed SUM(). Do no include a PARTITION BY because it will reset the sum when the partitioned by column value changes.
SELECT
T.*,
RunningTotalByDate = SUM(T.Amount) OVER (ORDER BY T.Downloaded ASC)
FROM
YourTable AS T
ORDER BY
T.Downloaded ASC,
T.Person ASC

SQL: Dividing daily data by a monthly index

I have daily transaction data that is a product of this query:
SELECT transaction_date ,
Merchant,
Amount
into transaction.table
FROM source.table
WHERE (DESCRIPTION iLIKE '%Criteria%')
The field transaction_date is in the format of DATE (yyyy-MM-dd).
What I would like to do is take each row/transaction in transaction.table and divide Amount by a value tied to its RESPECTIVE month (this is key) contained in a separate table called Calendar.
The separate table called Calendar is queried from the same source.table as below:
select month,count(*) as distinct_month
into source.Calendar
from
(
select Population, to_char(optimized_transaction_date, 'YYYY-MM') as month
FROM source.table
group by Population, to_char(optimized_transaction_date, 'YYYY-MM')
)
group by month
My goal is to get a value for each day: Amount / distinct_month.
The key part is matching the daily data (transaction_date) in the first query with the monthly data in the second query (month).
Note that month from second query is a varchar whereas transact_date in first query is DATE.
I think you want something like this:
SELECT transaction_date, Merchant, Amount, newval
FROM (SELECT transaction_date, Merchant, Amount, Description,
(Amount / count(distinct population) over (partition by to_char(transaction_date, 'YYYY-MM')
) as newval
FROM source.table
) t
WHERE DESCRIPTION iLIKE '%Criteria%';
You only need the subquery because the total is calculated over all the data, without the filter condition.
EDIT:
Oops, I forgot that Postgres doesn't support COUNT(DISTINCT) as a window function. So do:
SELECT transaction_date, Merchant, Amount, newval
FROM (SELECT t.*,
(Amount / SUM( (seqnum = 1)::int) OVER (partition by to_char(transaction_date, 'YYYY-MM') )
) as newval
FROM (SELECT t.*,
ROW_NUMBER() OVER (PARTITION BY partition by to_char(transaction_date, 'YYYY-MM'), population ORDER BY population) as seqnum
FROM source.table t
) t
) t
WHERE DESCRIPTION iLIKE '%Criteria%';

Tagging consecutive days

Supposedly I have data something like this:
ID,DATE
101,01jan2014
101,02jan2014
101,03jan2014
101,07jan2014
101,08jan2014
101,10jan2014
101,12jan2014
101,13jan2014
102,08jan2014
102,09jan2014
102,10jan2014
102,15jan2014
How could I efficiently code this in Greenplum SQL such that I can have a grouping of consecutive days similar to the one below:
ID,DATE,PERIOD
101,01jan2014,1
101,02jan2014,1
101,03jan2014,1
101,07jan2014,2
101,08jan2014,2
101,10jan2014,3
101,12jan2014,4
101,13jan2014,4
102,08jan2014,1
102,09jan2014,1
102,10jan2014,1
102,15jan2014,2
You can do this using row_number(). For a consecutive group, the difference between the date and the row_number() is a constant. Then, use dense_rank() to assign the period:
select id, date,
dense_rank() over (partition by id order by grp) as period
from (select t.*,
date - row_number() over (partition by id order by date) * 'interval 1 day'
from table t
) t