i am trying to do a calculation in BigQuery using SQL and I have no idea about how to go about this.
I have two tables:
Tablename: Periods
id INTEGER
start_date DATE
end_date DATE
And another table
Tablename: Data
date DATE
id INTEGER
value FLOAT
And what I want to do is to create a Query that can sum together the value for each id, and timerange (start_date to end_date) in the Periods table. In this case the Data table can have values for id's that are outside of the timerange in the Periods table so I need the Query to limit the summation to just from - to the start_date and end_date.
Hope someone can point me in the right direction on this.
Consider using subquery:
SELECT
id,
(SELECT SUM(value)
FROM Data
WHERE Data.id = Periods.id
AND Data.date >= Periods.start_date
AND Data.date <= Periods.end_date
) AS sums
FROM Periods
Related
I have a table containing the daily transactions with date column.
The table is in BigQuery and is partitioned by the date column.
What is the most effective way to query all month-end data from the table?
I tired the sql like below but it processed the whole table which is about 100GB
SELECT * FROM table
WHERE date = LAST_DAY(date , month)
It should process less bytes as the table is partitioned by the date? (like 300 mb if I just choose one specific end of month in the where clause)
SELECT * FROM table
WHERE date = "2022-11-30"
Any ways to get what I want with processing less data?
You can minimize volume of data processed and cost by Calculating a list of In Scope last_date of the month and apply filter condition over data partitioned tables.
Following example will explain you:-
Original data looks like as given below, output expected is highlighted record without scanning complete table
Code to achieve it is:-
with data as
(select '2020-11-20' as add1, 'Robert' as name Union all
select '2021-10-10' as add1, 'Smith' as name Union all
select '2023-9-9' as add1, 'Mike' as name Union all
select '2024-8-2' as add1, 'Donal' as name Union all
select '2025-7-31' as add1, 'Kim' as name ),
-- Calculing Inscope List of last_dates of the month
new_data as
(select add1, LAST_DAY(cast (add1 as date)) as last_dt
from data)
-- Applying filter condition on date fileds
select * from data a, new_data b
where cast (a.add1 as date)=last_dt
Output will be last record which is having last day of the month.
You can use the following query to filter on the last day of the current month and to process only the partition of the last day of month :
SELECT * FROM table
WHERE date = DATE_TRUNC(DATE_ADD(CURRENT_DATE('Europe/Paris'), INTERVAL 1 MONTH), MONTH) - 1;
The same query with a date column instead of the current date :
SELECT * FROM table
WHERE date = DATE_TRUNC(DATE_ADD(your_date_column, INTERVAL 1 MONTH), MONTH) - 1;
I have a google bigquery table with orders with a DATE column and other columns related to the orders. The starting date of the dataset is from 2021-01-01 (yyyy-mm-dd).
My aim is to filter on the DATE column from last year and this year to the previous iso week. For this, I used the ISOWEEK to create a new column:
WITH
last_week_last_year AS (
SELECT
DATE,
EXTRACT(ISOWEEK FROM DATE) AS isoweek,
FROM
`orders`
WHERE
EXTRACT(ISOWEEK FROM DATE) = EXTRACT(ISOWEEK FROM CURRENT_DATE())-1
GROUP BY 1, 2
ORDER BY DATE
)
SELECT * FROM last_week_last_year
This query results as the following table:
The issue is that when I filter on the original orders table by the DATE from the last_week_last_year table I get all the orders back instead of just the filtered version.
My method to filter is WHERE DATE IN (SELECT DATE FROM last_week_last_year) as seen below.
SELECT
*
FROM
`orders`
WHERE
DATE IN (SELECT DATE FROM last_week_last_year)
ORDER BY DATE DESC;
A snapshot of resulting table. It contains all of the records from 2021-01-01 until the latest day.
How can I make sure that on the latter query the table is filtered based on the first query's dates in DATE column?
I have a table of accident date I want to calculate the maximum between the difference of date i and date i + 1 which are in the same column. when we declare an accident date, I want to find the record of days without accidents.
You can use lag(). Assuming a table structure like mytable(dt), where dt is of a date-like datatype, you would do:
select max(diff)
from (select dt - lag(dt) over(order by dt) diff from mytable) t
I have a table tk. It has a column effective_date with a data type of date. The table also has some other columns. What I want to do is query the table such that output contains all dates having same year but month should differ
I have tried with below query in SQL Server, but it's not returning the desired result:
select * tk_id
from tk
group by tk_id, YEAR(effective_date);
Are you looking for a where clause?
select *
from t
where effective_date >= '2019-01-01' and effective_date < '2020-01-01'
I have a table with each row containing a start and end date with timestamp format and need to filter them by the number of business days between the start and end date.
Based on some of the solutions posted here, I created a separate table with all days and marked them with a boolean field like this:
CREATE TABLE tbl_holiday (h_date TIMESTAMP, is_holiday BOOLEAN)
Is it possible to write a query that filters by count days between start_date and date_date that has _is_holiday as False?
My database is Impala.
You would typically join the original table with the holiday table with inequality conditions on the start and end date, aggregate, and finally filter in a having clause by the sum of business days against your target value:
select t.id, t.start_date, t.end_date
from mytable t
inner join tbl_holiday h on h.hdate between t.start_date and t.end_date
group by t.id, t.start_date, t.end_date
having sum(cast(is_holiday as int)) = :no_of_business_days