Does BigQuery support partition by list or by range? The online document seems to say that it only supports partition by date. Can someone confirm?
Currently, BigQuery support partitioning only by date!
https://cloud.google.com/bigquery/docs/partitioned-tables
See also request for supporting non-date partitioning
Vote on it if you need this feature
https://issuetracker.google.com/issues/35905817
Related
As far as I understand Hive keeps track of schema for all partitions, handling schema evolution.
Is there any way to get schema for particular partition? For example, if I want to compare schema for some old partition with the latest one.
Show extended command does give you a bunch of information around the partition columns and its types, probably you could use those.
SHOW TABLE EXTENDED [IN|FROM database_name] LIKE 'identifier_with_wildcards' [PARTITION(partition_spec)];
Reference: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-ShowTable/PartitionExtended
(I am guessing based on How do I query the streaming buffer in BigQuery if the _PARTITIONTIME field isn't available with Standard SQL that my question has no simple solution, so I will "enhance" it)
I stream my data into Bigquery's partitioned and clustered table using a timestamp field (not an ingestion time partition).
I want to have a view that always look into the last hour data, what already in the table, plus what still in the buffer.
Since this table is not an ingestion time partitioned table, there is no pseudo column _PARTITIONTIME/DATE, so I can't use it in order to get the buffer data.
The only way I've found is by using legacy SQL: SELECT * FROM [dataset.streaming_data$__UNPARTITIONED__]
This is not good enough for me, since even if I save this as a view, I can't refer to a legacy SQL view from a standard SQL query.
Any idea how I can achieve this ?
Another idea I am thinking of - bigquery can have an external data source (using EXTERNAL_QUERY), which I can query using standard SQL.
A solution might be some "temporary" table on a separate database (such as PostgreSQL Cloud SQL) which will only have 1 hour of data, and won't have bigquery's buffer mechanism.
I think this is a bad solution, but I guess it might work...
What do you think ?
Thanks to #Felipe Hoffae I just found out I need to do nothing :-)
Buffered data is already available in any SQL query if the WHERE clause includes the data in it...
I've looked at previous questions, but the links given to GCP were outdated so I would like to learn what is the best way to do the conversion while inserting the correct partition (meaning not the day i inserted the records, but according to the "date" column.
Could someone point me in the right direction, specifically for Legacy SQL.
From the docs: "Currently, legacy SQL is not supported for querying partitioned tables or for writing query results to partitioned tables".
So, in this case, because Legacy can't write to partitioned tables, which seems to be a major blocking with no workarounds, you would have to use Standard SQL or Dataflow, as detailed in the answers of the question provided by Graham.
I'm trying to partition my tables in BQ, I've read the documentation and it always points to timePartition. I understand that this may be the default partition, but is it possible to define your table's column/s as the partition?
Any inputs would help. Thanks!
Not as of today. The only available partition type is "DAY"
When I try to use dynamic table partitions in a query in the web UI in BigQuery (like documented e.g. here), i.e.
SELECT * FROM [dataset.table$0-of-3]
I get the following error:
Error: Cannot read partition information from a table that is not partitioned: project:dataset.table$0-of-3
When I try a table that was partitioned with the new date partitioning (bq mk --time_partitioning_type=DAY ...), I do not get an error but instead:
Query returned zero records.
Also, I can't find the documentation on this feature anymore. Has it been deprecated?
I don't have enough reputation to comment on Mikhail's answer -- so adding an answer here.
At least for now, the dynamic table partitions described in the book were deprecated in favor of table partitioning as described in the latest BigQuery documentation.
We hope to provide richer flavors of partitioning in the future, but they may not be necessarily be available as table decorators.
This ($0-of-3) feature was never implemented - hopefuly it will at some point.
The ONLY partitioning decorator that was recently implemented was for date partitioned tables. see more at Partitioned Tables and timePartitioning.type