Partitioning public dateset does not return any values before an x date - google-bigquery

I looked at different articles, GCP documentation and tutorials online to see where the error may be on my side with no avail.
This script in the BigQuery editor UI does not partition dates before May/June 2022.
CREATE OR REPLACE TABLE <TABLE>
PARTITION BY
DATE(trip_start_timestamp)
AS (
SELECT
trip_start_timestamp, trip_seconds
FROM
`bigquery-public-data.chicago_taxi_trips.taxi_trips`
);
After the creation of the table, I checked the results and there are no partitions before May 2022.
I have tested this with a client library as well (Python), with different CAST by date or timestamp before the load, with different qualifying filters (WHERE clause).
No dates before May are picked up after the creation of the partition table.
Is there something I am missing that is so obvious that does not return the expected results?
The expected output should be a partitioned table with all the dates returned by the DDL statement, no just dates or timestamps between May and June 2022.
Edited September 5th:
For the interested reader, I've filed a bug on the issue tracker to investigate this further.

Related

SSRS data driven query?

I've got a question and it may sound dumb but am figuring it out as I go...
In SSRS there is an option to have a data driven query and in that you can edit the dataset to read parameters of the report who to send to ect., ect.,
Is there a way to have the query read an output of a subquery and if it doesn't equal the output it doesn't send but if it does, it does trigger the report sending?
In this particular example, the report needs to be triggered to send on the 3rd business day of the month. I have a query that reads the third business day written up but I am not sure how to get it into the query and read as if the date = 2023/01/04 then trigger report and send it off, otherwise do nothing, checking daily if it is that date.
In my business day query it has the columns, Date - which is the date, DayOfWeek - which is the numeral day of the week 2-6(for weekdays), Year, Month, Day, and Working day of the month(which is all 3s being the third business day.)
Should I have the query set to reading if workingdayofmonth = 3 then trigger the report? Would that be the easiest? I am not entirely sure how to code it as such into the SSRS data driven query.
Thank you for your time and help!
If you are using Enterprise edition, you can setup a data driven subscription.
I don't use Enterprise so I can't give a working exmaple but essentially, you create a dataset for the subscription that will only return data if your conditions are met.
As you previous question (linked here for other users reference) got you a calendar view that gives you the days the report needs to run, you can use that view, something like
SELECT * FROM myCalendarView WHERE TheDate = CAST(GetDate() AS Date)
The subscription will attempt to run everyday (or whatever the schedule is) but it will not produce anything unless the query above returns a resultset.
Take a look at this post which is similar to what you are attempting.
https://social.msdn.microsoft.com/Forums/sqlserver/en-US/88b6c7ec-3cba-4b5f-b09d-c098dc933063/how-to-modify-an-ssrs-subscription-to-only-run-first-monday-of-every-fiscal-month?forum=sqlreportingservices

Bigquery and Tableau

I attached Tableau with Bigquery and was working on the Dash boards. Issue hear is Bigquery charges on the data a query picks everytime.
My table is 200GB data. When some one queries the dash board on Tableau, it runs on total query. Using any filters on the dashboard it runs again on the total table.
on 200GB data, if someone does 5 filters on different analysis, bigquery is calculating 200*5 = 1 TB (nearly). For one day on testing the analysis we were charged on a 30TB analysis. But table behind is 200GB only. Is there anyway I can restrict Tableau running on total data on Bigquery everytime there is any changes?
The extract in Tableau is indeed one valid strategy. But only when you are using a custom query. If you directly access the table it won't work as that will download 200Gb to your machine.
Other options to limit the amount of data are:
Not calling any columns that you don't need. Do this by hiding unused fields in Tableau. It will not include those fields in the query it sends to BigQuery. Otherwise it's a SELECT * and then you pay for the full 200Gb even if you don't use those fields.
Another option that we use a lot is partitioning our tables. For instance, a partition per day of data if you have a date field. Using TABLE_DATE_RANGE and TABLE_QUERY functions you can then smartly limit the amount of partitions and hence rows that Tableau will query. I usually hide the complexity of these table wildcard functions away in a view. And then I use the view in Tableau. Another option is to use a parameter in Tableau to control the TABLE_DATE_RANGE.
1) Right now I learning BQ + Tableau too. And I found that using "Extract" is must for BQ in Tableau. With this option you can also save time building dashboard. So my current pipeline is "Build query > Add it to Tableau > Make dashboard > Upload Dashboard to Tableau Online > Schedule update for Extract
2) You can send Custom Quota Request to Google and set up limits per project/per user.
3) If each of your query touching 200GB each time, consider to optimize these queries (Don't use SELECT *, use only dates you need, etc)
The best approach I found was to partition the table in BQ based on a date (day) field which has no timestamp. BQ allows you to partition a table by a day level field. The important thing here is that even though the field is day/date with no timestamp it should be a TIMESTAMP datatype in the BQ table. i.e. you will end up with a column in BQ with data looking like this:
2018-01-01 00:00:00.000 UTC
The reasons the field needs to be a TIMESTAMP datatype (even though there is no time in the data) is because when you create a viz in Tableau it will generate SQL to run against BQ and for the partitioned field to be utilised by the Tableau generated SQL it needs to be a TIMESTAMP datatype.
In Tableau, you should always filter on your partitioned field and BQ will only scan the rows within the ranges of the filter.
I tried partitioning on a DATE datatype and looked up the logs in GCP and saw that the entire table was being scanned. Changing to TIMESTAMP fixed this.
The thing about tableau and Big Query is that tableau calculates the filter values using your query ( live query ). What I have seen in my project logging is, it creates filters from your own query.
select 'Custom SQL Query'.filtered_column from ( your_actual_datasource_query ) as 'Custom SQL Query' group by 'Custom SQL Query'.filtered_column
Instead, try to create the tableau data source with incremental extracts and also try to have your query date partitioned ( Big Query only supports date partitioning) so that you can limit the data use.

SSAS 2014, relation to time dimension using YYYYMM field. Best practice?

In the process of creating a new cube for my client, I encountered a problem I'm not sure hot to deal with.
I have a table that doesn't have a DateTime field; instead it has a varchar field which contains year and month in YYYYMM format. I need to create a relation to my Time dimension using that field; and proceed with creating Year-Quarter_Month hierarchy.
First thing I did was creating a new named calculation in Time table from the date field to match the YYYYMM format. Now, I understand that the relation can't be created because it would break the referential integrity.
My idea is to create a new Time table/dimension and delete all records except the first day of the month, create a YYYYMM named calculation and then I would be able to create a relation between my table and that new Time table. But, is this the right approach and what downsides I can expect?
Thank you!
You don't need a separate time dimension...when connecting the new fact table to the existing time dimension (in the dimension usage tab) you can set the YYYYMM attribute as the "granularity attribute"...SSAS will handle the rest :-)
Also, if the format is YYYYMM (ie. 201504 for april 2015) then you might consider making it an integer (instead of a varchar) to save some space in your fact table.
You can create a new calculated column in that table to add the first day of the month:
convert(date, YYYYMMfield*1000+01)
and then link it with your key in your time dimension.
This way, your YYYY-QQ-MM hierarchy levels work as intended, if the user is going deeper and wants to see monthly values on a daily basis, he will find all values for that month added to the first day.

BigQuery: Why does Table Range Decorators return wrong result sometimes?

I've been using the Table Range Decorators feature daily since May in order to only query the data from the last 7 days in some of my tables.
Since 2 weeks, I've noticed that sometimes some data is missing when I use that feature. For example, if do a query to get the results for the last 7 days (by adding "#-604800000--1" to table), some data will be missing as opposed to if I query on the whole table (without a table decorator).
I wonder what could explain this and if there is a fix coming soon to address this?
If this can help the BigQuery team, I've noticed that when using Table Decorators some data was missing for us for October 16th between around 16:00 and 20:00 UTC time.
For the BigQuery team here are 2 jobs ids where some data is missing: job_-xtL4PlIYhNjQ5weMnssvqDmd6U , job_9ASNxqq_swjCd1eMmiQ6SmPpxlQ
and 1 job id where data is correct(without decorators): job_QbcRwYGbQv0BZdHreQEvRlYh-mM
This is a known issue with table decorators containing a time range. Due to a bug in BigQuery, it is possible for certain time ranges to omit data that should be included within the time range.
We're working on a fix and plan to have it released next week. After this fix is deployed time range decorators should again work as expected.

Calculating holiday dates

I have to create "holiday" table and then create php script so I could show it on my site.
Holidays can be specific, like 15.05.2012 - 15-th of the may.
And non-specific: First(or second, third) sunday of july
Is there any way to create calculated column, so this phrase "First(or second, third) sunday of july", could turn into x.07.2012.
Use a calendar table. There is no magic code built into SQL Server that knows when Easter is. This article shows the basic premise - you fill up a table with all the dates from year x to year y, then you update a column called IsHoliday for the dates that are holidays based on specific logic (easiest to do this once, in a loop, then all your code later can refer to the calculated bit):
ASP Faq reference. The current link no longer works, this is the archive.org cached version of the page
The link in the answer now takes you to a bogus page that wants to load a virus. Just heads up.
http://codeinet.blogspot.com/2006/08/auxiliary-calendar-table-for-sql.html
This seems to be a working version.