BQ Partitioning by column instead of date

BQ Partitioning by column instead of date - google-bigquery

I'm trying to partition my tables in BQ, I've read the documentation and it always points to timePartition. I understand that this may be the default partition, but is it possible to define your table's column/s as the partition?
Any inputs would help. Thanks!

Not as of today. The only available partition type is "DAY"

Related

Get Hive partition schemas

As far as I understand Hive keeps track of schema for all partitions, handling schema evolution.
Is there any way to get schema for particular partition? For example, if I want to compare schema for some old partition with the latest one.

Show extended command does give you a bunch of information around the partition columns and its types, probably you could use those.
SHOW TABLE EXTENDED [IN|FROM database_name] LIKE 'identifier_with_wildcards' [PARTITION(partition_spec)];
Reference: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-ShowTable/PartitionExtended

Redshift Spectrum partitioning a table using two date fields

I was searching for best practices to create partitions by date, using amazon-redshift-spectrum, but the examples shows the problem being solved by partitioning the table by one date only. What to do if I have more than one date field?
Eg: Mobile events with user_install_date and event_date
How performative is to partition your s3 like:
installdate=2015-01-01/eventdate=2017-01-01
installdate=2015-01-01/eventdate=2017-01-02
installdate=2015-01-01/eventdate=2017-01-03
Will It kill my select performance ? What is the best strategy in this case?

If you data was partitioned in the above manner, then a query that merely had eventdate in the WHERE clause (without installdate) would be less efficient.
It would still need to look through every installdate directory, but it could skip over eventdate directories that do not match the predicate.
Put the less-used parameter second.

BigQuery Partition By List

Does BigQuery support partition by list or by range? The online document seems to say that it only supports partition by date. Can someone confirm?

Currently, BigQuery support partitioning only by date!
https://cloud.google.com/bigquery/docs/partitioned-tables
See also request for supporting non-date partitioning
Vote on it if you need this feature
https://issuetracker.google.com/issues/35905817

Google Big Query - Date-Partitioned Tables with Eventual Data

Our use case for BigQuery is a little unique. I want to start using Date-Partitioned Tables but our data is very much eventual. It doesn't get inserted when it occurs, but eventually when it's provided to the server. At times this can be days or even months before any data is inserted. Thus, the _PARTITION_LOAD_TIME attribute is useless to us.
My question is there a way I can specify the column that would act like the _PARTITION_LOAD_TIME argument and still have the benefits of a Date-Partitioned table? If I could emulate this manually and have BigQuery update accordingly, then I can start using Date-Partitioned tables.
Anyone have a good solution here?

You don't need create your own column.
_PARTITIONTIME pseudo column still will work for you!
The only what you will need to do is insert/load respective data batch into respective partition by referencing not just table name but rather table with partition decorator - like yourtable$20160718
This way you can load data into partition that it belong to

Partitioning Datetime and Date data type

I have one small question for partitioning table.
I have created a partition function on Date datatype, most of the table covers under it but some big table have to DataTime data type.
So the question is how can i use both data types in singe partition function and scheme? Or how can I handle this situation?
Appreciated your Help and Support.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

BQ Partitioning by column instead of date - google-bigquery

I'm trying to partition my tables in BQ, I've read the documentation and it always points to timePartition. I understand that this may be the default partition, but is it possible to define your table's column/s as the partition? Any inputs would help. Thanks!

Not as of today. The only available partition type is "DAY"

Related

Get Hive partition schemas

Redshift Spectrum partitioning a table using two date fields

BigQuery Partition By List

Google Big Query - Date-Partitioned Tables with Eventual Data

Partitioning Datetime and Date data type

Categories

Resources