error on querying partition table using wildcard operator in bigquery - google-bigquery

I need to create a separate table/view from a partition table with daily data.
I used this query:
SELECT * FROM `awantec-gws-bigquery2.gmail_log_dataset_copy.daily_*`
but I get this error:
Views cannot be queried through prefix. First view awantec-gws-bigquery2:gmail_log_dataset_copy.daily_gmail_log.

Related

Create partitioned table with date suffix in bigquery using SQL or web UI

I want to create such table:
CREATE TABLE sometable
(SELECT columns, columns, date_col)
PARTITIONED BY date_col
And I want it to be date partitioned with the date in table suffix: sometable$date_partition
I read the docs, but can't complete this neither with web UI nor with SQL.
The web UI shows such error "Missing argument for parameter DATE."
My table name is "daily_export_${DATE}"
My partitioning column isn't blank, it's date_col.
Can I have a simple example, please?
PARTITION BY goes earlier
The query needs to parse the table suffix into a DATE type.
For example:
CREATE OR REPLACE TABLE temp.so
PARTITION BY date_from_table_name
AS
SELECT PARSE_DATE('%Y%m%d', _table_suffix) date_from_table_name, event_timestamp, event_name, items
FROM `bingo-blast-174dd.analytics_151321511.events_*`
WHERE _table_suffix BETWEEN '20200530' AND '20200531'
LIMIT 10
As you can see in this documentation, BigQuery implements two different concepts: sharded tables and partitioned tables
The first one (sharded tables) is a way of dividing a whole table into many tables with a date suffix. You can query those tables individually or using wildcards. For example, instead of creating a single table named events, you can create many tables named events_20200101, events_20200102, [...]
When you do that, you are able to query any of those tables individually or you can query all of them by running some query like select * from events_*
The second concept (partitioned tables) is an approach to fragment your table in smaller pieces in order to improve the performance and reduce costs when querying data. Partitioned tables can be based on some column of your table or even on the ingestion time. When you table is partitioned by ingestion time you can access a pseudo column named _PARTITIONTIME
When comparing both approaches, the documentation says:
Date/timestamp partitioned tables perform better than tables sharded
by date. When you create date-named tables, BigQuery must maintain a
copy of the schema and metadata for each date-named table. Also, when
date-named tables are used, BigQuery might be required to verify
permissions for each queried table. This practice also adds to query
overhead and impacts query performance. The recommended best practice
is to use date/timestamp partitioned tables instead of date-sharded
tables.
In your case, you basically need to create a partitioned table without a date in its name.

Using ingestion-time based pseudo-field (_PARTITIONTIME) as partition while clustering

I'd like to cluster our ingestion-time partitioned tables without having to change the ETL scripts we use to update them. All of our tables are partitioned on the pseudo-field _PARTITIONTIME, now when I try cluster a table with DML I get the following error:
Invalid field name "_PARTITIONTIME". Field names are not allowed to start with the (case-insensitive) prefixes _PARTITION, TABLE, FILE and _ROW_TIMESTAMP
Here's what the DML-script looks like:
CREATE TABLE `table_target`
PARTITION BY DATE(_PARTITIONTIME)
CLUSTER BY a, b, c
AS
SELECT
*, _PARTITIONTIME
FROM
`table_source`
How should I go about this? Is there a way to keep the same pseudo-field as the partition field, should I re-work the partition field, or am I missing something here?
It is Known limitation that:
It is not possible to create an ingestion-time partitioned table from the result of a query. Instead, use a CREATE TABLE DDL statement to create the table, and then use an INSERT DML statement to insert data into it.
In your case, you need to use CREATE TABLE to create target_table with CLUSTER BY first, then migrate data over.

how to convert a non-partitioned table into a partitioned one

How to rename a TABLE in Big query using StandardSQL or LegacySQL.
I'm trying with StandardSQL but it is giving following error,
RENAME TABLE dataset.old_table_name TO dataset.new_table_name;
Statement not supported: RenameStatement at [1:1]
Does it mean there is no any method(SQL QUERY) Which can rename a table?
I just want to change from non-partition table to partition-table
You can achieve this in two steps process
Step 1 - Export your table to Google Cloud Storage
Step 2 - Load file from GCS back to GBQ into new table with partitioned column
Both are free of charge
Still, have in mind some limitatins of partitioned tables - like number of partitions for example - it is 4000 per table as of today - https://cloud.google.com/bigquery/quotas#partitioned_tables
Currently it is not possible to rename table in Bigquery as explained in this document. You will have to create another table by following the steps given by Mikhail. Notice there is still some charge from table storage, but it is minimal. See this doc for detail information.
You can use the below query, it will create a new table with distinct records from old table with partition on given column.
create or replace table `dataset.new_table` PARTITION BY DATE(date_time_column) as select distinct * from `dataset.old_table`

How to discard partition column from hive view while selecting?

How to discard/hide partition column from hive view while selecting , at the same time filter can apply using where clause from the view created on base table partition column, base table is a partitioned table?
For Ex: my table ddl is create table test(id int) partitioned by (year);
view DDL: create view myview select id,year from test;
Now I don't want to see the value of year while selecting the data from view at the same time I should be able to query on specific partition of the base table using myview.
There is a concept of creating a partitioned view available now in HIVE. You should try exploring.
For example,
CREATE VIEW myview PARTITIONED ON (year)
AS SELECT id, year FROM test;
Refer to the below link to understand the rules to be adhered for partition columns when written from the base tables. It seems still limited therefore to be used only if it fits your needs.
https://cwiki.apache.org/confluence/display/Hive/PartitionedViews

BigQuery - select from nested table to another table

I loaded nested json file to a table and got nested table.
Now i want to filter the table data by where clause and insert it to another table.
so i will do destination table - new table
and source table = select * from old_nested_table where status=1
The problem is that the select * is flatten the old nested table while i want to keep it nested.
How can i query nested table in order to get nested results?
You should
check on "Allow Large Results"
and
check off "Flatten Results"
as below
These settings allow you to save result preserving schema of original table