I have a partitioned view with 20 tables. Each table has a partition key (usp_id) ranging from 1 to 20. If I query the partitioned view using the partition key then only the table with the correct usp_id is queried which is fine.
Now i have a second table which has two fields. Usp_id and insert_date. The insert_date in this table is updated daily. It is a one to one mapping in this table.
I would like to be able to query my partition view based on insert_date which then would use the usp_id to query the partitioned view.
Is this possible?
Many thanks in advance!
Related
I've been trying to add multiple partition columns, to a BigQuery table, but it seems to only take one field, even if I add multiple partition fields in the query parameters.
I'm partitioning by date time and integer range.
It only takes the later of the pair to create partitions and ignores the first partition field.
Any ideas, would be appreciated?
BigQuery only supports partitioning on one column. If you want to partition on multiple columns, you can consider partitioning+clustering. The table can be clustered on up to 4 columns.
I use coalesce to combine the columns and partition the new field created from coalesce, worked for my purpose.
I want to create such table:
CREATE TABLE sometable
(SELECT columns, columns, date_col)
PARTITIONED BY date_col
And I want it to be date partitioned with the date in table suffix: sometable$date_partition
I read the docs, but can't complete this neither with web UI nor with SQL.
The web UI shows such error "Missing argument for parameter DATE."
My table name is "daily_export_${DATE}"
My partitioning column isn't blank, it's date_col.
Can I have a simple example, please?
PARTITION BY goes earlier
The query needs to parse the table suffix into a DATE type.
For example:
CREATE OR REPLACE TABLE temp.so
PARTITION BY date_from_table_name
AS
SELECT PARSE_DATE('%Y%m%d', _table_suffix) date_from_table_name, event_timestamp, event_name, items
FROM `bingo-blast-174dd.analytics_151321511.events_*`
WHERE _table_suffix BETWEEN '20200530' AND '20200531'
LIMIT 10
As you can see in this documentation, BigQuery implements two different concepts: sharded tables and partitioned tables
The first one (sharded tables) is a way of dividing a whole table into many tables with a date suffix. You can query those tables individually or using wildcards. For example, instead of creating a single table named events, you can create many tables named events_20200101, events_20200102, [...]
When you do that, you are able to query any of those tables individually or you can query all of them by running some query like select * from events_*
The second concept (partitioned tables) is an approach to fragment your table in smaller pieces in order to improve the performance and reduce costs when querying data. Partitioned tables can be based on some column of your table or even on the ingestion time. When you table is partitioned by ingestion time you can access a pseudo column named _PARTITIONTIME
When comparing both approaches, the documentation says:
Date/timestamp partitioned tables perform better than tables sharded
by date. When you create date-named tables, BigQuery must maintain a
copy of the schema and metadata for each date-named table. Also, when
date-named tables are used, BigQuery might be required to verify
permissions for each queried table. This practice also adds to query
overhead and impacts query performance. The recommended best practice
is to use date/timestamp partitioned tables instead of date-sharded
tables.
In your case, you basically need to create a partitioned table without a date in its name.
I have an existing date partitioned table and I want to create a new date partitioned table with only one column from the original table while keeping the original partitioning.
I have tried:
Creating an empty partitioned table and copying the results in from a query but then the partitioning is missing.
The following Create statement should work if I also include the partitioned date as a column in my new table which I don't want. Is there a way to use the part_date column as a partition decorator when loading in the data from a query result ?
CREATE TABLE
cat_dataset.cats_names(cat_name string)
PARTITION BY
part_date AS
SELECT
cat_name,
_PARTITIONDATE AS part_date
FROM
`myproject.cat_dataset.cats`
I want to avoid looping over all the dates and writing the data off that date to the new table. Is there a way to use the part_date column as a partition decorator when loading in the data from a query result ?
INSERT INTO allows you to specify _PARTITIONTIME as a column, see link. Code below should work:
CREATE TABLE cat_dataset.cats_names(cat_name string)
PARTITION BY DATE(_PARTITIONTIME);
INSERT INTO cat_dataset.cats_names (_PARTITIONTIME, cat_name)
SELECT _PARTITIONTIME, cat_name
FROM `myproject.cat_dataset.cats`
Hi my table has 150 columns and i dont want to mention the column name while doing partition. Is there any way using temp table to create partitioned table from non partitioned one.
Temporary table in Hive doesn't support Partitioning.
You cannot create any permanent partitioned table without mentioning Partition
Column.
How to discard/hide partition column from hive view while selecting , at the same time filter can apply using where clause from the view created on base table partition column, base table is a partitioned table?
For Ex: my table ddl is create table test(id int) partitioned by (year);
view DDL: create view myview select id,year from test;
Now I don't want to see the value of year while selecting the data from view at the same time I should be able to query on specific partition of the base table using myview.
There is a concept of creating a partitioned view available now in HIVE. You should try exploring.
For example,
CREATE VIEW myview PARTITIONED ON (year)
AS SELECT id, year FROM test;
Refer to the below link to understand the rules to be adhered for partition columns when written from the base tables. It seems still limited therefore to be used only if it fits your needs.
https://cwiki.apache.org/confluence/display/Hive/PartitionedViews