BigQuery Order By one financial year(52 weeks) - google-bigquery

I have this dataset right now.
Date
Sales
Group
2022-11-02
xxxxxxxx
A
2022-11-03
xxxxxx
A
2022-11-03
xxxxxx
B
2021-11-03
xxxxxx
A
2021-11-04
xxxxxx
B
2021-11-04
xxxxxx
A
I want to order my data as this, where it will order the date by one year
Date
Sales
Group
2022-11-02
xxxxxxxx
A
2021-11-03
xxxxxx
A
2022-11-03
xxxxxx
A
2021-11-04
xxxxxx
A
2022-11-03
xxxxxx
B
2021-11-04
xxxxxx
B
(because they have 52 weeks of interval)
Is there a possible way to do it?
I want to avoid join!
Sorry just to make it clear, I need to make sure that the the first row['date'] and second row['date'] has exactly 52 weeks of interval
i.e. date_sub(second_row['date'],interval 52 week) == first row['date']
Really sorry for the confusing

Consider below instead of previous answer.
WITH sample_data AS (
SELECT DATE '2022-11-02' Date, 'xxxxxxxx' Sales, 'A' `Group` UNION ALL
SELECT '2022-11-03' Date, 'xxxxxx' Sales, 'A' `Group` UNION ALL
SELECT '2022-11-03' Date, 'xxxxxx' Sales, 'B' `Group` UNION ALL
SELECT '2021-11-03' Date, 'xxxxxx' Sales, 'A' `Group` UNION ALL
SELECT '2021-11-04' Date, 'xxxxxx' Sales, 'B' `Group` UNION ALL
SELECT '2021-11-04' Date, 'xxxxxx' Sales, 'A' `Group`
)
SELECT *
FROM sample_data
ORDER BY `Group`,
EXTRACT(WEEK FROM Date) || EXTRACT(DAYOFWEEK FROM Date),
EXTRACT(YEAR FROM Date) DESC;
Query results:

Related

Count of On-Going Transaction in BigQuery

I have this table:
book_name
borrow_date
return_date
A
2022-08-01
2022-08-03
B
2022-08-03
2022-09-01
C
2022-08-15
2022-09-25
D
2022-09-15
2022-09-18
E
2022-09-17
2022-10-15
And table of first date of the month
summary_month
2022-08-01
2022-09-01
2022-10-01
I would like to count how many books are currently borrowed based on the summary_month. The result I am looking for is:
summary_month
count_book
list_book
2022-08-01
3
A,B,C
2022-09-01
4
B,C,D,E
2022-10-01
1
E
I am stuck with only able to aggregate them based on the borrowed date with query:
count(distinct case when summary_month = date_trunc(borrow_date,month) then book_name end) count_book
Is it possible to get the result I am hoping for? Really need anyone's help and advice. Thank you.
Consider below option
select summary_month,
count(distinct book_name) as count_book,
string_agg(book_name) as list_book
from your_table, unnest(generate_date_array(
date_trunc(borrow_date, month),
date_trunc(return_date, month),
interval 1 month)
) as summary_month
group by summary_month
if applied to sample data in your question -output is
Something like this can work:
with
input as (
select 'A' book_name, cast('2022-08-01' as date) borrow_date , cast('2022-08-03' as date) return_date union all
select 'B', '2022-08-03', '2022-09-01' union all
select 'C', '2022-08-15', '2022-09-25' union all
select 'D', '2022-09-15', '2022-09-18' union all
select 'E', '2022-09-17', '2022-10-15'
),
list_month as (
select distinct
* except(days_borrowed),
date_trunc(days_borrowed, month) as month
from input,
unnest(generate_date_array(borrow_date, return_date)) as days_borrowed
)
select
month,
count(distinct book_name) as count_distinct_book,
string_agg(distinct book_name) as book_name_list
from list_month
group by 1
order by 1

Grouping by Date inclusivity

Here is the data I'm working with here
Accountid
Month
123
08/01/2021
123
09/01/2021
123
03/01/2022
123
04/01/2022
123
05/01/2022
123
06/01/2022
I'm trying to insert into a new table where the data is like this
Accountid
Start Month
End Month
123
08/01/2021
09/01/2021
123
03/01/2022
06/01/2022
I'm not sure how to separate them with the gap, and group by the account id in this case.
Thanks in advance
In 12c+ you may also use match_recognize for gaps-and-islands problems to define grouping rules (islands) in a more readable and natural way.
select *
from input_
match_recognize(
partition by accountid
order by month asc
measures
first(month) as start_month,
last(month) as end_month
/*Any month followed by any number of subsequent month */
pattern(any_ next*)
define
/*Next is the month right after the previous one*/
next as months_between(month, prev(month)) = 1
)
ACCOUNTID
START_MONTH
END_MONTH
123
2021-08-01
2021-09-01
123
2022-03-01
2022-06-01
db<>fiddle here
That's a gaps and islands problem; one option to do it is:
Sample data:
SQL> with test (accountid, month) as
2 (select 123, date '2021-01-08' from dual union all
3 select 123, date '2021-01-09' from dual union all
4 select 123, date '2021-01-03' from dual union all
5 select 123, date '2021-01-04' from dual union all
6 select 123, date '2021-01-05' from dual union all
7 select 123, date '2021-01-06' from dual
8 ),
Query begins here:
9 temp as
10 (select accountid, month,
11 to_char(month, 'J') - row_number() Over
12 (partition by accountid order by month) diff
13 from test
14 )
15 select accountid,
16 min(month) as start_month,
17 max(month) as end_Month
18 from temp
19 group by accountid, diff
20 order by accountid, start_month;
ACCOUNTID START_MONT END_MONTH
---------- ---------- ----------
123 03/01/2021 06/01/2021
123 08/01/2021 09/01/2021
SQL>
Although related to MS SQL Server, have a look at Introduction to Gaps and Islands Analysis; should be interesting reading for you, I presume.

Expand a query from a date to a range of dates

I have a query as below:
SELECT
"2022-05-10 00:00:00 UTC" AS date_,
COUNT(salesId) AS total-sales
FROM
`project1.sales.sales-growth`
WHERE
(promoDate BETWEEN "2022-05-10 00:00:00 UTC"
AND "2022-05-11 00:00:00 UTC")
OR
(purchaseDate BETWEEN "2022-05-10 00:00:00 UTC"
AND "2022-05-11 00:00:00 UTC")
Which shows the total sale for a particular date (2022-05-11) as below:
date_ total-sales
2022-05-10 560
I am wondering how I can change the query to show all the May month sales per day (desired output):
date_ total-sales
2022-05-01 567
2022-05-02 687
2022-05-03 878
... ...
2022-05-31 500
One option: generate a date array for the target time range, group by those dates and compare those dates in the WHERE clause with your two date columns.
With an assumed table of yours:
WITH your_table AS
(
SELECT TIMESTAMP("2022-05-01 15:30:00+00") AS promoDate, NULL AS purchaseDate, 1 AS salesId
UNION ALL
SELECT NULL AS promoDate, TIMESTAMP("2022-05-01 18:30:00+00") AS purchaseDate, 1 AS salesId
UNION ALL
SELECT TIMESTAMP("2022-05-02 15:30:00+00") AS promoDate, NULL AS purchaseDate, 1 AS salesId
UNION ALL
SELECT TIMESTAMP("2022-05-03 15:30:00+00") AS promoDate, NULL AS purchaseDate, 1 AS salesId
UNION ALL
SELECT TIMESTAMP("2022-05-04 15:30:00+00") AS promoDate, NULL AS purchaseDate, 1 AS salesId
UNION ALL
SELECT NULL AS promoDate, TIMESTAMP("2022-05-04 18:30:00+00") AS purchaseDate, 1 AS salesId
)
SELECT
date_,
COUNT(salesId) AS total_sales
FROM
UNNEST(GENERATE_DATE_ARRAY("2022-05-01", "2022-05-31")) AS date_, your_table
WHERE
date_ = EXTRACT(DATE FROM promoDate)
OR
date_ = EXTRACT(DATE FROM purchaseDate)
GROUP BY
date_
Output:
Row
date_
total_sales
1
2022-05-01
2
2
2022-05-02
1
3
2022-05-03
1
4
2022-05-04
2

Getting last 4 months data from given date column some months data is midding

I have below data
Record_date ID
28-feb-2022 xyz
31-Jan-2022 ABC
30-nov-2022 jkl
31-oct-2022 dcs
I want to get last 3 months data from given date column. We don't have to consider the missing month.
Output should be:
Record_date ID
28-feb-2022 xyz
31-Jan-2022 ABC
30-nov-2022 jkl
In the last 3 months Dec is missing but we have to ignore it as the data is not available. Tried many things but not working.
Any suggestions?
Assuming you are using Oracle then you can use Oralce ADD_MONTHS function and filter the data.
--- untested
-- Assumption Record_date is a date column
SELECT * FROM table1
where Record_date > ADD_MONTHS(SYSDATE, -3)
To get the data for the three months that are latest in the table, you can use:
SELECT record_date,
id
FROM (
SELECT t.*,
DENSE_RANK() OVER (ORDER BY TRUNC(Record_date, 'MM') DESC) AS rnk
FROM table_name t
)
WHERE rnk <= 3;
Which, for the sample data:
CREATE TABLE table_name (Record_date, ID) AS
SELECT DATE '2022-02-28', 'xyz' FROM DUAL UNION ALL
SELECT DATE '2022-01-31', 'ABC' FROM DUAL UNION ALL
SELECT DATE '2022-11-30', 'jkl' FROM DUAL UNION ALL
SELECT DATE '2022-10-31', 'dcs' FROM DUAL;
Outputs:
RECORD_DATE
ID
2022-11-30 00:00:00
jkl
2022-10-31 00:00:00
dcs
2022-02-28 00:00:00
xyz
db<>fiddle here

Month counts between dates

I have the below table. I need to count how many ids were active in a given month. So thinking I'll need to create a row for each id that was active during that month so that id can be counted each month. A row should be generated for a term_dt during that month.
active_dt term_dt id
1/1/2018 101
1/1/2018 5/15/2018 102
3/1/2018 6/1/2018 103
1/1/2018 4/25/18 104
Apparently this is a "count number of overlapping intervals" problem. The algorithm goes like this:
Create a sorted list of all start and end points
Calculate a running sum over this list, add one when you encounter a start and subtract one when you encounter an end
If two points are same then perform subtractions first
You will end up with list of all points where the sum changed
Here is a rough outline of the query. It is for SQL Server but could be ported to any RDBMS that supports window functions:
WITH cte1(date, val) AS (
SELECT active_dt, 1 FROM #t AS t
UNION ALL
SELECT COALESCE(term_dt, '2099-01-01'), -1 FROM #t AS t
-- if end date is null then assume the row is valid indefinitely
), cte2 AS (
SELECT date, SUM(val) OVER(ORDER BY date, val) AS rs
FROM cte1
)
SELECT YEAR(date) AS YY, MONTH(date) AS MM, MAX(rs) AS MaxActiveThisYearMonth
FROM cte2
GROUP BY YEAR(date), MONTH(date)
DB Fiddle
I was toying with a simpler query, that seemed to do the trick, for Oracle:
with candidates (month_start) as (
select to_date ('2018-' || column_value || '-01','YYYY-MM-DD')
from
table
(sys.odcivarchar2list('01','02','03','04','05',
'06','07','08','09','10','11','12'))
), sample_data (active_dt, term_dt, id) as (
select to_date('01/01/2018', 'MM/DD/YYYY'), null, 101 from dual
union select to_date('01/01/2018', 'MM/DD/YYYY'),
to_date('05/15/2018', 'MM/DD/YYYY'), 102 from dual
union select to_date('03/01/2018', 'MM/DD/YYYY'),
to_date('06/01/2018', 'MM/DD/YYYY'), 103 from dual
union select to_date('01/01/2018', 'MM/DD/YYYY'),
to_date('04/25/2018', 'MM/DD/YYYY'), 104 from dual
)
select c.month_start, count(1)
from candidates c
join sample_data d
on c.month_start between d.active_dt and nvl(d.term_dt,current_date)
group by c.month_start
order by c.month_start
An alternative solution would be to use a hierarchical query, e.g.:
WITH your_table AS (SELECT to_date('01/01/2018', 'dd/mm/yyyy') active_dt, NULL term_dt, 101 ID FROM dual UNION ALL
SELECT to_date('01/01/2018', 'dd/mm/yyyy') active_dt, to_date('15/05/2018', 'dd/mm/yyyy') term_dt, 102 ID FROM dual UNION ALL
SELECT to_date('01/03/2018', 'dd/mm/yyyy') active_dt, to_date('01/06/2018', 'dd/mm/yyyy') term_dt, 103 ID FROM dual UNION ALL
SELECT to_date('01/01/2018', 'dd/mm/yyyy') active_dt, to_date('25/04/2018', 'dd/mm/yyyy') term_dt, 104 ID FROM dual)
SELECT active_month,
COUNT(*) num_active_ids
FROM (SELECT add_months(TRUNC(active_dt, 'mm'), -1 + LEVEL) active_month,
ID
FROM your_table
CONNECT BY PRIOR ID = ID
AND PRIOR sys_guid() IS NOT NULL
AND LEVEL <= FLOOR(months_between(coalesce(term_dt, SYSDATE), active_dt)) + 1)
GROUP BY active_month
ORDER BY active_month;
ACTIVE_MONTH NUM_ACTIVE_IDS
------------ --------------
01/01/2018 3
01/02/2018 3
01/03/2018 4
01/04/2018 4
01/05/2018 3
01/06/2018 2
01/07/2018 1
01/08/2018 1
01/09/2018 1
01/10/2018 1
Whether this is more or less performant than the other answers is up to you to test.