Create Partition table in Big Query - sql

Can anyone please suggest how to create partition table in Big Query ?.
Example: Suppose I have one log data in google storage for the year of 2016. I stored all data in one bucket partitioned by year , month and date wise. Here I want create table with partitioned by date.
Thanks in Advance

Documentation for partitioned tables is here:
https://cloud.google.com/bigquery/docs/creating-partitioned-tables
In this case, you'd create a partitioned table and populate the partitions with the data. You can run a query job that reads from GCS (and filters data for the specific date) and writes to the corresponding partition of a table. For example, to load data for May 1st, 2016 -- you'd specify the destination_table as table$20160501.
Currently, you'll have to run several query jobs to achieve this process. Please note that you'll be charged for each query job based on bytes processed.
Please see this post for some more details:
Migrating from non-partitioned to Partitioned tables

There are two options:
Option 1
You can load each daily file into separate respective table with name as YourLogs_YYYYMMDD
See details on how to Load Data from Cloud Storage
After tables created, you can access them either using Table wildcard functions (Legacy SQL) or using Wildcard Table (Standar SQL). See also Querying Multiple Tables Using a Wildcard Table for more examples
Option 2
You can create Date-Partitioned Table (just one table - YourLogs) - but you still will need to load each daily file into respective partition - see Creating and Updating Date-Partitioned Tables
After table is loaded you can easily Query Date-Partitioned Tables

Having partitions for an External Table is not allowed as for now. There is a Feature Request for it:
https://issuetracker.google.com/issues/62993684
(please vote for it if you're interested in it!)
Google says that they are considering it.

Related

Bigquery - Updating GCS path of Bigquery external tables

I have a requirement where I might have to update the Bigquery External tables on a periodic basis.
The GCS location has timestamp for every incremental run, I would like to update to the latest timestamp folder as the path of External table.
One way i see is only dropping the table and creating again by pointing it to latest folder. But, is there any other way to update it without dropping the table
As suggested by #Samuel , you can use the SQL statement CREATE or REPLACE EXTERNAL TABLES for your requirement. Scheduled queries support DML and DDL statements which can be used to create the new tables. You can use the below mentioned query parameter to create the table according to your schedule :
My_database_name.my_table_name.my_results_{run_date}
For more information you can refer to this documentation.

Add new partition-scheme to existing table in athena with SQL code

Is it even possible to add a partition to an existing table in Athena that currently is without partitions? If so, please also write syntax for doing so in the answer.
For example:
ALTER TABLE table1 ADD PARTITION (ourDateStringCol = '2021-01-01')
The above command will give the following error:
FAILED: SemanticException table is not partitioned but partition spec exists
Note: I have done a web-search, and variants exist for SQL server, or adding a partition to an already partitioned table. However, I personally could not find a case where one could successfully add a partition to an existing non-partitioned table.
This is extremely similar to:
SemanticException adding partiton Hive table
However, the answer given there requires re-creating the table.
I want to do so without re-creating the table.
Partitions in Athena are based on folder structure in S3. Unlike standard RDBMS that are loading the data into their disks or memory, Athena is based on scanning data in S3. This is how you enjoy the scale and low cost of the service.
What it means is that you have to have your data in different folders in a meaningful structure such as year=2019, year=2020, and make sure that the data for each year is all and only in that folder.
The simple solution is to run a CREATE TABLE AS SELECT (CTAS) query that will copy the data and create a new table that can be optimized for your analytical queries. You can choose the table format (Parquet, for example), the compression (SNAPPY, for example), and also the partition schema (per year, for example).

Google BigQuery Partitione Tables - How to create tables automatically daily?

The question is how to let Google BigQuery automatically create partitioned tables on the daily base (one day -> one table, etc.)?
I've used the following command in the command line to create the table:
bq mk --time_partitioning_type=DAY testtable1
The table1 appeared in the dataset, but how to create tables for every day automatically?
From the partitioned table documentation, you need to run the command to create the table only once. After that, you specify the partition to which you want to write as the destination table of the query, such as testtable1$20170919.

How to create dynamic tables in google bigquery dataset and access in tableau?

I have a BigQuery database, i want to create dynamic tables.
Ex: table_20170609 - if date is 9th june 2017
table_20170610 - if date is 10th june 2017
Daily i will get some excel data and i have to upload to the above dynamically created table. Data in excel is not day wise, it will be from start date to today's date.
I know connecting bigquery to tableau and running queries. Is there any automated method where tableau will read dynamic table from bigquery and generate the report.
current working - i have created one table(reports) and everyday i will rename the table reports to reports_bkp_date and will create new table reports.
I'm new to bigquery and tableau, i would like to know -
How to create dynamic tables in bigquery?
How to connect dynamic table to tableau (daily i should not change table name manually)?
You have two immediate options - firstly, create a view in BigQuery (instead of a table) which will collate together all relevant tables, then connect to this in Tableau.
The better approach, given that you are having to manually upload a new table every day, is to use a wildcard table connection in Tableau and use a similar naming convention for your data tables, for example you might use DailyData_2017_* to capture all tables in he following format:
DailyData_2017_06_01
DailyData_2017_06_02
DailyData_2017_06_03
Finally, note that you can append to a table in BigQuery, rather than replacing it's contents. If your data is time stamped then this might work for you too.
Ben

How to transfer data between databases with SELECT statement in Teradata

So I am stuck on this Teradata problem and I am looking to the community for advice as I am new to the TD platform. I am currently working with a Teradata Data Warehouse and have an interesting task to solve. Currently we store our information in a live production database but want to stage tables in another database before using FastExport to export the files. Basically we want to move our tables into a database to take a quick snapshot.
I have been exploring different solutions and am unsure how to proceed. I need to be able to automate a create table process from one DB in Teradata to another. The tricky part is I would like to create many tables off of the source table using a WHERE clause. For example, I have a transaction table and want to take a snapshot of the transaction table for a certain date range month by month. Meaning that the original table Transaction would be split into many tables such as Transaction_May2001, Transaction_June2001, Transaction_July2001 and so on and so forth.
Thanks
This is assuming by two databases you are referring to the same physical installation of Teradata.
You can use the CREATE TABLE AS construct to accomplish this:
CREATE TABLE {MyDB}.Transaction_May2001
AS (
SELECT *
FROM Transaction
WHERE Transaction_Date BETWEEN DATE '2001-05-01' AND '2001-05-31'
)
{UNIQUE} PRIMARY INDEX ({Same PI definition as Transaction Table})
WITH DATA AND STATS;
If you neglect to specify the explicit PI in the CREATE TABLE AS then Teradata will take the first column of the SELECT clause and use it as the PI of the new table.
Otherwise, you would be looking to use a Teradata utility as suggested by ryanbwork in the comment to your question.