Bigquery Materialized view on partitioned table - google-bigquery

I'm not able to create materialized view over a partitioned table, even though I have added partition filter in the query itself.
Source Table has partition column ts:timestamp, partitioned by date(ts)
Query for materialized view:
create materialized view `materialized_view_1`
partition by Date(ts)
as
select Date(ts) date, count(*) count from `source_table`
where date(ts) > "2021-01-01"
group by 1
Error -
Table source_table requires a partition filter. Materialized views over a table that requires a partition filter must have a filter over the table's partitioning column or output the column.

Related

Creating dynamic partition in Range partitioning

I have below scenario.
Suppose I have a table which has 3 partition. one is for 20190201 next is 20190202 and one is for 20190210.
I have been given requirement. whichever date we pass automatic partition should be created.
so if I am using dynamic sql I am able to create partition after the max partition for eg 20190211. but if I want to create partition for 20190205 it is giving error.
Is there anyway to create the partition at run time without data loss even when max partition exist.
We have been told not to create interval partitioning
this is very simple.
while creating the table itself use interval partition on the date column.
you can choose the partition interval as hour/day/month whichever you like.
so any time you insert a new data to the table based on the date value the data will go to correct partition or create a new partition.
use the below syntax in your table while creating..
partition by range ( date_col )
interval ( NUMTODSINTERVAL(1,'day') )
( partition p1 values less then ( date '2016-01-01' ))

Error when I try to create ordered SQL table

I'm trying to create a volatile table in SQL with an ORDER BY and I get an error.
CREATE VOLATILE TABLE orderd_dates AS
(SELECT * FROM date_table
ORDER BY id_date)
with data primary index (id_date) on commit preserve rows;
The error is: ORDER BY is not allowed in subqueries.
If I can't use order by, how can I create a volatile table that's ordered?
SQL tables are inherently unordered. You need to explicitly use an order by clause when querying the table, not when creating it.
You could add TOP 100 PERCENT to allow the ORDER BY, but the table would still be unordered, because a table is internally ordered by the Hash of the Primary Index. And if you use a NO PRIMARY INDEX TABLE and it would actually be stored in the specified order the optimizer wouldn't know about it.
The closest thing you can get is to PARTITION BY RANGE_N(id_date BETWEEN DATE '2000-01-01' AND DATE '2050-12-31' EACH INTERVAL '1' DAY:
CREATE VOLATILE TABLE orderd_dates AS
(SELECT * FROM date_table
)
WITH DATA
PRIMARY INDEX (id_date)
PARTITION BY Range_N(id_date BETWEEN DATE '2000-01-01'
AND DATE '2050-12-31' EACH INTERVAL '1' DAY)
ON COMMIT PRESERVE ROWS;

Performance of finding max partition of hive table

Now, I have a hive table partitioned by dt, dt is the date string. This table also has a field col, value of which is equal to dt. Do those two sqls have any difference in performance?
SQL1:
select max(dt) from test_table
SQL2:
select max(col) from test_table
There won't be any difference if the datatype and record are similar for both case.
But if the query contains where clause with partitioned value then performance will be faster as partitioned table is going to scan only specific partition not entire table.

Bigquery - How to keep partition in target table

I need to select rows from a partitioned table and save the result into another table, how can I keep records' __PARTITIONTIME the same as they are in the source table? I mean, not only to keep the value of __PARTITIONTIME, but the whole partition feature so that I can do further queries on the target table using time decor and like stuff.
(I'm using Datalab notebooks)
%%sql -d standard --module TripData
SELECT
HardwareId,
TripId,
StartTime,
StopTime
FROM
`myproject.mydataset.TripData`
WHERE
_PARTITIONTIME BETWEEN TIMESTAMP_TRUNC(TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 * 24 HOUR),DAY)
AND TIMESTAMP_TRUNC(CURRENT_TIMESTAMP(),DAY)
You cannot do this for multiple partitions at once!
You should do it one partition at a time specifying target partition - targetTable$yyyymmdd
Note: first you need to create target table as a partitioned table with respective schema

How to get COUNT(*) from one partition of a table in SQL Server 2012?

My table have 7 million records and I do split table in 14 part according to ID, each partition include 5 million record and size of partition is 40G. I want to run a query to get count in one partition but it scan all partitions and time of Query become very large.
SELECT COUNT(*)
FROM Item
WHERE IsComplated = 0
AND ID Between 1 AND 5000000
How can I run my query on one partition only without scan other partition?
Refer http://msdn.microsoft.com/en-us/library/ms188071.aspx
B. Getting the number of rows in each nonempty partition of a partitioned table or index
The following example returns the number of rows in each partition of table TransactionHistory that contains data. The TransactionHistory table uses partition function TransactionRangePF1 and is partitioned on the TransactionDate column.
To execute this example, you must first run the PartitionAW.sql script against the AdventureWorks2012 sample database. For more information, see PartitioningScript.
USE AdventureWorks2012;
GO
SELECT $PARTITION.TransactionRangePF1(TransactionDate) AS Partition,
COUNT(*) AS [COUNT] FROM Production.TransactionHistory
GROUP BY $PARTITION.TransactionRangePF1(TransactionDate)
ORDER BY Partition ;
GO
C. Returning all rows from one partition of a partitioned table or index
The following example returns all rows that are in partition 5 of the table TransactionHistory.
Note Note
To execute this example, you must first run the PartitionAW.sql script against the AdventureWorks2012 sample database. For more information, see PartitioningScript.
SELECT * FROM Production.TransactionHistory
WHERE $PARTITION.TransactionRangePF1(TransactionDate) = 5 ;