Is it possible to do projection in Google Big Query? - google-bigquery

I have a query (due to restrictions, it is using Legacy SQL) that produces a column that is the rolling average of last 3 days of sale (excluding today)
SELECT
id, date, sales, AVG(sales) OVER (PARTITION BY id ORDER BY date RANGE BETWEEN 4 PRECEDING AND 1 PRECEDING) AS projected_sale
FROM tableA
tableA
+-------+---------+---------+
| id | date | sales |
+-------+---------+---------+
| 1 | 01-01-17| 5 |
| 1 | 01-02-17| 6 |
| 1 | 01-03-17| 7 |
| 1 | 01-04-17| 10 |
+-------+---------+---------+
The query produces
+-------+---------+---------+--------------+
| id | date | sales |projected_sale|
+-------+---------+---------+--------------+
| 1 | 01-01-17| 5 | . |
| 1 | 01-02-17| 6 | . |
| 1 | 01-03-17| 7 | . |
| 1 | 01-04-17| 10 | 6 |
+-------+---------+---------+--------------+
Since the average is excluding the current row, theoretically I can project the sale for 01-05-17 using the sales from (01-02 to 01-04). However since tableA doesn't actually have a entry with date 01-05-17, my query stops at 01-04-17 as the last row.
Is what I am trying to do possible in Big Query?
Thank you

First, I think using RANGE is incorrect here - it should be ROWS instead
Anyway, below is an example for BigQuery Legacy SQL that demonstrates how to achieve result you need.
#legacySQL
SELECT
id, dt, sales,
AVG(sales) OVER (
PARTITION BY id ORDER BY dt
ROWS BETWEEN 4 PRECEDING AND 1 PRECEDING
) AS projected_sale
FROM tableA, (SELECT 1 id, '01-05-17' dt, 0 sales)
As you can see here you just simply adding (UNION ALL - comma in Kegacy SQL) that missing day. Of course you can transform that one such that it will add such missing row for all id's
Nevetherless - hope this is a good starting point for you
You can test / play with it using dummy data as in your question
#legacySQL
SELECT
id, dt, sales,
AVG(sales) OVER (
PARTITION BY id ORDER BY dt
ROWS BETWEEN 4 PRECEDING AND 1 PRECEDING
) AS projected_sale
FROM (
SELECT * FROM
(SELECT 1 id, '01-01-17' dt, 5 sales),
(SELECT 1 id, '01-02-17' dt, 6 sales),
(SELECT 1 id, '01-03-17' dt, 7 sales),
(SELECT 1 id, '01-04-17' dt, 10 sales)
) tableA, (SELECT 1 id, '01-05-17' dt, 0 sales)
with result as
Row id dt sales projected_sale
1 1 01-01-17 5 null
2 1 01-02-17 6 5.0
3 1 01-03-17 7 5.5
4 1 01-04-17 10 6.0
5 1 01-05-17 0 7.0

Related

How to order a table before finding MAX() difference between two consecutive rows

I have a table like this:
Events
----+------+-----
id |start | end
----+------+-----
1 | 3 | 5
2 | 8 | 10
3 | 14 | 17
4 | 6 | 6
5 | 19 | 20
I would like to find the biggest number of empty days between two consecutive events.
Desired result:
3
This query return the MAX() gap, but I can't seem to find a way to order the result by the end column first:
SELECT MAX(empty)
FROM
( SELECT a.start-b.end-1 AS empty
FROM
Reservations AS a,
Reservations AS b
WHERE a.id=b.id+1
GROUP BY b.end
ORDER BY b.end
);
Use lag():
select max(start - prev_end) - 1 as diff
from (select t.*, lag(end) over (order by start) as prev_end
from t
) t
where prev_end is not null;
Note: This assumes that the periods are not overlapping, which is consistent with the data you have provided.

SQL Server : processing by group

I have a table with the following data:
Id Date Value
---------------------------
1 Dec-01-2019 10
1 Dec-03-2019 5
1 Dec-05-2019 8
1 Jan-03-2020 6
1 Jan-07-2020 3
1 Jan-08-2020 9
2 Dec-01-2019 4
2 Dec-03-2019 7
2 Dec-31-2019 9
2 Jan-04-2020 4
2 Jan-09-2020 6
I need to group it to the following format: 1 record per month per id. If month is closed, so date will be the last day of that month, if not, the last day available. Max and average are calculated using all data until that date.
Id Date Max_Value Average_Value
-----------------------------------------------
1 Dec-31-2019 10 7,6
1 Jan-08-2020 10 6,8
2 Dec-31-2019 9 6,6
2 Jan-09-2020 9 6,0
Any easy SQL to obtain this analysis?
Regards,
Hmmm . . . You want to aggregate by month and then just take the maximum date in the month:
select id, max(date), max(value), avg(value * 1.0)
from t
group by id, eomonth(date)
order by id, max(date);
If by closed month you mean that it's not the last month of the id then:
select id,
case
when year(Date) = year(maxDate) and month(Date) = month(maxDate) then maxDate
else eomonth(Date)
end Date,
max(maxValue) Max_Value,
round(avg(1.0 * Value), 1) Average_Value
from (
select *,
max(Date) over (partition by Id) maxDate,
max(Value) over (partition by Id) maxValue
from tablename
) t
group by id,
case
when year(Date) = year(maxDate) and month(Date) = month(maxDate) then maxDate
else eomonth(Date)
end
order by id, Date
See the demo.
Results:
> id | Date | Max_Value | Average_Value
> -: | :--------- | --------: | :------------
> 1 | 2019-12-31 | 10 | 7.7
> 1 | 2020-01-08 | 10 | 6.0
> 2 | 2019-12-31 | 9 | 6.7
> 2 | 2020-01-09 | 9 | 5.0

sql group by personalised condition

Hi,I have a column as below
+--------+--------+
| day | amount|
+--------+---------
| 2 | 2 |
| 1 | 3 |
| 1 | 4 |
| 2 | 2 |
| 3 | 3 |
| 4 | 3 |
+--------+--------+
now I want something like this sum day 1- day2 as row one , sum day1-3 as row 2, and so on.
+--------+--------+
| day | amount|
+--------+---------
| 1-2 | 11 |
| 1-3 | 14 |
| 1-4 | 17 |
+--------+--------+
Could you offer any one help ,thanks!
with data as(
select 2 day, 2 amount from dual union all
select 1 day, 3 amount from dual union all
select 1 day, 4 amount from dual union all
select 2 day, 2 amount from dual union all
select 3 day, 3 amount from dual union all
select 4 day, 3 amount from dual)
select distinct day, sum(amount) over (order by day range unbounded preceding) cume_amount
from data
order by 1;
DAY CUME_AMOUNT
---------- -----------
1 7
2 11
3 14
4 17
if you are using oracle you can do something like the above
Assuming the day range in left column always starts from "1-", What you need is a query doing cumulative sum on the grouped table(dayWiseSum below). Since it needs to be accessed twice I'd put it into a temporary table.
CREATE TEMPORARY TABLE dayWiseSum AS
(SELECT day,SUM(amount) AS amount FROM table1 GROUP BY day ORDER BY day);
SELECT CONCAT("1-",t1.day) as day, SUM(t2.amount) AS amount
FROM dayWiseSum t1 INNER JOIN dayWiseSum
t2 ON t1.day > t2.day
--change to >= if you want to include "1-1"
GROUP BY t1.day, t1.amount ORDER BY t1.day
DROP TABLE dayWiseSum;
Here's a fiddle to test with:
http://sqlfiddle.com/#!9/c1656/1/0
Note: Since sqlfiddle isn't allowing CREATE statements, I've replaced dayWiseSum with it's query there. Also, I've used "Text to DDL" option to paste the exact text of the table from your question to generate the create table query :)

Unable to use lag function correctly in sql

I have created a table from multiple tables like this:
Week | Cid | CustId | L1
10 | 1 | 1 | 2
10 | 2 | 1 | 2
10 | 5 | 1 | 2
10 | 4 | 1 | 1
10 | 3 | 2 | 1
4 | 6 | 1 | 2
4 | 7 | 1 | 2
I want the output as:
Repeat
0
1
1
0
0
0
1
So, basically what I want is for each week, if a person (custid) comes in again with the same L1, then the value in the column Repeat should become 1, otherwise 0 ( so like, here, in row 2 & 3, custid 1, came with L1=2 again, so it will get 1 in column "Repeat", however in row 4, custid 1 came with L1=1, so it will get value as ).
By the way, the table isn't ordered (as I've shown).
I'm trying to do it as follows:
select t.*,
lag(0, 1, 0) over (partition by week, custid, L1 order by cid) as repeat
from
table;
But this is not giving the output and is giving empty result.
I think you need a case, but I would use row_number() for this:
select t.*,
(case when row_number() over (partition by week, custid, l1 order by cid) = 1
then 0 else 1
end) as repeat
from table;
This can also be computed without Window functions but by a self-join in the following way:
SELECT a.week, a.cid, a.custid, a.l1,
CASE WHEN b IS NULL THEN 1 ELSE 0 END AS repeat
FROM mytable a NATURAL LEFT JOIN
(SELECT week, min(cid) AS cid, custid, l1 FROM mytable
GROUP BY week,custid,l1) b
ORDER BY week DESC, custid, l1 DESC, cid;
It can be done simply by using an count(*) as analytic function. No case expression or self join needed. The query is even portable across databases that support analytic functions:
SELECT cust.*, least(count(*)
OVER (PARTITION BY Week, CustId, L1 ORDER BY Cid
ROWS UNBOUNDED PRECEDING) - 1, 1) repeat
FROM cust ORDER BY Week DESC, custId, L1 DESC;
Executing the query on your data results in the following output (last row is the repeat row):
Week | Cid | CustId | L1 | repeat
10 1 1 2 0
10 2 1 2 1
10 5 1 2 1
10 4 1 1 0
10 3 2 1 0
4 6 1 2 0
4 7 1 2 1
Tested on Oracle 11g and PostgreSQL 9.4. Note that the second ORDER BY is optional. See Oracle Language Reference, Analytic Functions for more details.

How to calculate the value of a previous row from the count of another column

I want to create an additional column which calculates the value of a row from count column with its predecessor row from the sum column. Below is the query. I tried using ROLLUP but it does not serve the purpose.
select to_char(register_date,'YYYY-MM') as "registered_in_month"
,count(*) as Total_count
from CMSS.USERS_PROFILE a
where a.pcms_db != '*'
group by (to_char(register_date,'YYYY-MM'))
order by to_char(register_date,'YYYY-MM')
This is what i get
registered_in_month TOTAL_COUNT
-------------------------------------
2005-01 1
2005-02 3
2005-04 8
2005-06 4
But what I would like to display is below, including the months which have count as 0
registered_in_month TOTAL_COUNT SUM
------------------------------------------
2005-01 1 1
2005-02 3 4
2005-03 0 4
2005-04 8 12
2005-05 0 12
2005-06 4 16
To include missing months in your result, first you need to have complete list of months. To do that you should find the earliest and latest month and then use heirarchial
query to generate the complete list.
SQL Fiddle
with x(min_date, max_date) as (
select min(trunc(register_date,'month')),
max(trunc(register_date,'month'))
from users_profile
)
select add_months(min_date,level-1)
from x
connect by add_months(min_date,level-1) <= max_date;
Once you have all the months, you can outer join it to your table. To get the cumulative sum, simply add up the count using SUM as analytical function.
with x(min_date, max_date) as (
select min(trunc(register_date,'month')),
max(trunc(register_date,'month'))
from users_profile
),
y(all_months) as (
select add_months(min_date,level-1)
from x
connect by add_months(min_date,level-1) <= max_date
)
select to_char(a.all_months,'yyyy-mm') registered_in_month,
count(b.register_date) total_count,
sum(count(b.register_date)) over (order by a.all_months) "sum"
from y a left outer join users_profile b
on a.all_months = trunc(b.register_date,'month')
group by a.all_months
order by a.all_months;
Output:
| REGISTERED_IN_MONTH | TOTAL_COUNT | SUM |
|---------------------|-------------|-----|
| 2005-01 | 1 | 1 |
| 2005-02 | 3 | 4 |
| 2005-03 | 0 | 4 |
| 2005-04 | 8 | 12 |
| 2005-05 | 0 | 12 |
| 2005-06 | 4 | 16 |