With bigquery variable #DS_END_DATE I am losing 24 hours - sql

So, there is a piece of query from bigquery to google data studio report:
WHERE creation_date IN (NULL, '1970-01-01T00:00:00') OR creation_date BETWEEN PARSE_DATE('%Y%m%d', #DS_START_DATE) AND PARSE_DATE('%Y%m%d', #DS_END_DATE)
And #DS_END_DATE helps users select date range in GDS report. However, when filtering data, I am missing last day. For example, if #DS_END_DATE is '2022-10-20', I can't find rows with creation_date = '2022-10-20T14:03:08' in report. But report contains all data till '2022-10-20'
How can I query bigquery to get all stuff from #DS_END_DATE like it is '2022-10-20T23:59:59' ?
P.S. Yeah, there were some troubles first time, so I am using creation_date IN (NULL, '1970-01-01T00:00:00') to identify deleted IDs, but it doesn't matter.

When BigQuery is doing the comparison it is normalizing the data types and casting your provided date as a timestamp. When it does this conversion it looks like 2022-10-20 00:00:00 UTC. Given that you can see why it is dropping things on 2022-10-20.
To alleviate this you can do something like:
select
creation_date
from sample_data
where cast(creation_date as date) between '2022-10-19' and '2022-10-20'

Related

BigQuery - COUNTIF Unixtimestamp meets event_date in YYYYMMDD format?

I'm working with Google Analytics App plus Web data in BigQuery.
I want to count a user AS new_user WHEN user_first_touch_timestamp value matchs the table's event date value. This would result in a a count of new users who visited the site on a particular day.
Example value in user_first_touch_timestamp
1595912758378962
Example value in event_date
20200809
How can I do this?
Thanks.
Below is for BigQuery Standard SQL
You should parse both values t same DATE type - as below
PARSE_DATE('%Y%m%d', event_date) AS event_date_day
and
DATE(TIMESTAMP_MICROS(user_first_touch_timestamp)) AS user_first_touch_timestamp_day
After this is done - now you can do whatever comparison you need
For example - if you want to use it in WHERE clause - it can look like below
WHERE PARSE_DATE('%Y%m%d', event_date) = DATE(TIMESTAMP_MICROS(user_first_touch_timestamp))

SELECT if exists where DATE=TODAY, if not where DATE=YESTERDAY

I have a table with some columns and a date column (that i made a partition with)
For example
[Amount, Date ]
[4 , 2020-4-1]
[3 , 2020-4-2]
[5 , 2020-4-4]
I want to get the latest Amount based on the Date.
I thought about doing a LIMIT 1 with ORDER BY, but, is that optimized by BigQuery or it will scan my entire table?
I want to avoid costs at all possible, I thought about doing a query based on the date today, and if nothing found search for yesterday, but I don't know how to do it in only one query.
Below is for BigQuery Standard SQL
#standardSQL
SELECT ARRAY_AGG(amount ORDER BY `date` DESC LIMIT 1)[SAFE_OFFSET(0)]
FROM `project.dataset.table`
WHERE `date` >= DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)
Note: above assumes your date field is of DATE data type.
If your date field is a partition, you can use it in WHERE clause to filter which partitions should be read in your query.
In your case, you could do something like:
SELECT value
FROM <your-table>
WHERE Date >= DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)
ORDER BY Data DESC
LIMIT 1
This query basically will:
Filter only today's and yesterday's partitions
Order the rows by your Date field, from the most recent to the older
Select the first element of the ordered list
If the table has a row with today's date, the query will return the data for today. If it dont't, the query will return the data for yesterday.
Finally, I would like to attach here this reference regarding querying partitioned tables.
I hope it helps
The LIMIT order stops the query whet it gets the amount of results indicated.
I think the query should be something like this, I'm not sure if "today()-1" returns
SELECT Amount
FROM <table> as t
WHERE date(t.Date) = current_date()
OR date(t.Date) = makedate(year(current_date()), dayofyear(current_date())-1);
Edited: Sorry, my answer is for MariaDB I now see you ask for Google-BigQuery which I didn't even know, but it looks like SQL, I hope it has some functions like the ones I posted.

Use DataStudio to specify the date range for a custom query in BigQuery, where the date range influences operators in the query

I currently have a DataStudio dashboard connected to a BigQuery custom query.
That BQ query has a hardcoded date range and the status of one of the columns (New_or_Relicensed) can change dynamically for a row, based on the dates specified in the range. I would like to be able to alter that range from DataStudio.
I have tried:
simply connecting the DS dashboard to the custom query in BQ and then introducing a date range filter, but as you can imagine - that does not work because it's operating on an already hard-coded date range.
reviewing similar answers, but their problem doesn't appear to be quite the same E.g. BigQuery Data Studio Custom Query
Here is the query I have in BQ:
SELECT t0.New_Or_Relicensed, t0.Title_Category FROM (WITH
report_range AS
(
SELECT
TIMESTAMP '2019-06-24 00:00:00' AS start_date,
TIMESTAMP '2019-06-30 00:00:00' AS end_date
)
SELECT
schedules.schedule_entry_id AS Schedule_Entry_ID,
schedules.schedule_entry_starts_at AS Put_Up,
schedules.schedule_entry_ends_at AS Take_Down,
schedule_entries_metadata.contract AS Schedule_Entry_Contract,
schedules.platform_id AS Platform_ID,
platforms.platform_name AS Platform_Name,
titles_metadata.title_id AS Title_ID,
titles_metadata.name AS Title_Name,
titles_metadata.category AS Title_Category,
IF (other_schedules.schedule_entry_id IS NULL, "new", "relicensed") AS New_Or_Relicensed
FROM
report_range, client.schedule_entries AS schedules
JOIN client.schedule_entries_metadata
ON schedule_entries_metadata.schedule_entry_id = schedules.schedule_entry_id
JOIN
client.platforms
ON schedules.platform_id = platforms.platform_id
JOIN
client.titles_metadata
ON schedules.title_id = titles_metadata.title_id
LEFT OUTER JOIN
client.schedule_entries AS other_schedules
ON schedules.platform_id = other_schedules.platform_id
AND other_schedules.schedule_entry_ends_at < report_range.start_date
AND schedules.title_id = other_schedules.title_id
WHERE
((schedules.schedule_entry_starts_at >= report_range.start_date AND
schedules.schedule_entry_starts_at <= report_range.end_date) OR
(schedules.schedule_entry_ends_at >= report_range.start_date AND
schedules.schedule_entry_ends_at <= report_range.end_date))
) AS t0 LIMIT 100;
Essentially - I would like to be able to set the start_date and end_date from google data studio, and have those dates incorporated into the report_range that then influences the operations in the rest of the query (that assign a schedule entry as new or relicensed).
Have you looked at using the Custom Query interface of the BigQuery connector in Data Studio to define start_date and end_date as parameters as part of a filter.
Your query would need a little re-work...
The following example custom query uses the #DS_START_DATE and #DS_END_DATE parameters as part of a filter on the creation date column of a table. The records produced by the query will be limited to the date range selected by the report user, reducing the number of records returned and resulting in a faster query:
Resources:
Introducing BigQuery parameters in Data Studio
https://www.blog.google/products/marketingplatform/analytics/introducing-bigquery-parameters-data-studio/
Running parameterized queries
https://cloud.google.com/bigquery/docs/parameterized-queries
I had a similar issue where I wanted to incorporate a 30 day look back before the start (#ds_start_date). In this case I was using Google Analytics UA session data and using table suffix in my where clause. I was able to calculate a date RELATIVE to the built in data studio "string" dates by using the following:
...
WHERE
_table_suffix BETWEEN
CAST(FORMAT_DATE('%Y%m%d', DATE_SUB (PARSE_DATE('%Y%m%d',#DS_START_DATE), INTERVAL 30 DAY)) AS STRING)
AND
CAST(FORMAT_DATE('%Y%m%d', DATE_SUB (PARSE_DATE('%Y%m%d',#DS_END_DATE), INTERVAL 0 DAY)) AS STRING)

Postgresql query between date ranges

I am trying to query my postgresql db to return results where a date is in certain month and year. In other words I would like all the values for a month-year.
The only way i've been able to do it so far is like this:
SELECT user_id
FROM user_logs
WHERE login_date BETWEEN '2014-02-01' AND '2014-02-28'
Problem with this is that I have to calculate the first date and last date before querying the table. Is there a simpler way to do this?
Thanks
With dates (and times) many things become simpler if you use >= start AND < end.
For example:
SELECT
user_id
FROM
user_logs
WHERE
login_date >= '2014-02-01'
AND login_date < '2014-03-01'
In this case you still need to calculate the start date of the month you need, but that should be straight forward in any number of ways.
The end date is also simplified; just add exactly one month. No messing about with 28th, 30th, 31st, etc.
This structure also has the advantage of being able to maintain use of indexes.
Many people may suggest a form such as the following, but they do not use indexes:
WHERE
DATEPART('year', login_date) = 2014
AND DATEPART('month', login_date) = 2
This involves calculating the conditions for every single row in the table (a scan) and not using index to find the range of rows that will match (a range-seek).
From PostreSQL 9.2 Range Types are supported. So you can write this like:
SELECT user_id
FROM user_logs
WHERE '[2014-02-01, 2014-03-01]'::daterange #> login_date
this should be more efficient than the string comparison
Just in case somebody land here... since 8.1 you can simply use:
SELECT user_id
FROM user_logs
WHERE login_date BETWEEN SYMMETRIC '2014-02-01' AND '2014-02-28'
From the docs:
BETWEEN SYMMETRIC is the same as BETWEEN except there is no
requirement that the argument to the left of AND be less than or equal
to the argument on the right. If it is not, those two arguments are
automatically swapped, so that a nonempty range is always implied.
SELECT user_id
FROM user_logs
WHERE login_date BETWEEN '2014-02-01' AND '2014-03-01'
Between keyword works exceptionally for a date. it assumes the time is at 00:00:00 (i.e. midnight) for dates.
Read the documentation.
http://www.postgresql.org/docs/9.1/static/functions-datetime.html
I used a query like that:
WHERE
(
date_trunc('day',table1.date_eval) = '2015-02-09'
)
or
WHERE(date_trunc('day',table1.date_eval) >='2015-02-09'AND date_trunc('day',table1.date_eval) <'2015-02-09')

View data by date after Format 'mmyy'

I'm trying to answer questions like, how many POs per month do we have? Or, how many lines are there in every PO by month, etc. The original PO dates are all formatted #1/1/2013#. So my first step was to Format each PO record date into 'mmyy' so I could group and COUNT them.
This worked well but, now I cannot view the data by date... For example, I cannot ask 'How many POs after December did we get?' I think this is because SQL does not recognize mm/yy as a comparable date.
Any ideas how I could restructure this?
There are 2 queries I wrote. This is the query to format the dates. This is also the query I was trying to add the date filter to (ex: >#3/14#)
SELECT qryALL_PO.POLN, Format([PO CREATE DATE],"mm/yy") AS [Date]
FROM qryALL_PO
GROUP BY qryALL_PO.POLN, Format([PO CREATE DATE],"mm/yy");
My group and counting query is:
SELECT qryALL_PO.POLN, Sum(qryALL_PO.[LINE QUANTITY]) AS SUM_QTY_PO
FROM qryALL_PO
GROUP BY qryALL_PO.POLN;
You can still count and group dates, as long as you have a way to determine the part of the date you are looking for.
In Access you can use year and month for example to get the year and month part of the date:
select year(mydate)
, month(mydate)
, count(*)
from tableX
group
by year(mydate)
, month(mydate)
You can format it 'YYYY-MM' , and then use '>' for 'after' clause