Create a BigQuery view to get the latest rows from a partitioned (and clustered) table - google-bigquery

The issue
I'm trying to create a view to get the latest rows from a partitioned table, filtered on the date partition _LOCALDATETIME and zero or more cluster fields. I can create a view which uses a partition and I can create a view which handles some filters, but I can't work out the syntax to achieve both.
An example query requirement
SELECT fieldA, fieldB, fieldC FROM theView
WHERE date between '2021-01-01' and '2021-12-31' AND
_CLUSTERFIELD1 = 'foo'
GROUPBY _CLUSTERFIELD2
ORDERBY _CLUSTERFIELD3
Table schema
_LOCALDATETIME
_id
_CLUSTERFIELD1
_CLUSTERFIELD2
_CLUSTERFIELD3
_CLUSTERFIELD4
...other fields

Base on what I'm understanding from your case I have come with this approach.
I have created partion table based on _LOCALDATETIME with clustered fields and then the view that returns the data from a defined date scope and the value of the last elements based on _id. So, that will allow me to have a view which have the last items of a partitioned table from a fixed date range.
view
CREATE VIEW `<my-project-id>.<dataset>.<table>` AS
with range_id as (
select MAX(_id) as last_id_partition,_localdatetime as partition_ FROM
`<my-project-id>.<dataset>.<table>` where _localdatetime BETWEEN "2020-01-01" and "2022-01-01" group by _localdatetime)
SELECT s.*
FROM
`<my-project-id>.<dataset>.<table>` s
inner join range_id r on s._id = r.last_id_partition and s._localdatetime = r.partition_
where _localdatetime BETWEEN "2020-01-01" and "2022-01-01"
group by _id,_localdatetime,_name,_location
The view will return the last ids of a partioned clustered table with the clustered fields that are within the view (which is for year 2020 and 2021).
query
select * from `<my-project-id>.<dataset>.<table>`
WHERE _localdatetime between '2021-12-21' and '2021-12-22'
and <clusteredfield> = 'Venezuela'
It will return the records available for that filter as the data its already defined in the view.
What you can't do is to have a view without the partition field as it must exist to query a partitioned table. You can also update use the queries inside a function to further customize your outputs.

Related

Data Loaded wrongly into Hive Partitioned table after adding a new column using ALTER

I already have a Hive partitioned table. I needed to add a new column to the table, so i used ALTER to add the column like below.
ALTER TABLE TABLE1 ADD COLUMNS(COLUMN6 STRING);
I have my final table load query like this:
INSERT OVERWRITE table Final table PARTITION(COLUMN4, COLUMN5)
select
stg.Column1,
stg.Column2,
stg.Column3,
stg.Column4(Partition Column),Field Name:Code Sample value - YAHOO.COM
stg.Column5(Partition Column),Field Name:Date Sample Value - 2021-06-25
stg.Column6(New Column) Field Name:reason sample value - Adjustment
from (
select fee.* from (
select
fees.* ,
ROW_NUMBER() OVER (PARTITION BY fees.Column1 ORDER BY fees.Column3 DESC) as RNK
from Stage table fee
) fee
where RNK = 1
) stg
left join (
select Column1 from Final table
where Column5(date) in (select distinct column5(date) from Stage table)
) TGT
on tgt.Column1(id) = stg.Column1(id) where tgt.column1 is null
UNION
select
tgt.column1(id),
tgt.column2,
tgt.column3,
tgt.column4(partiton column),
tgt.column5(partiton column-date),
tgt.column6(New column)
from
Final Table TGT
WHERE TGT.Column5(date) in (select distinct column5(date) from Stage table);"
Now when my job ran today, and when i try to query the final table, i get the below error
Invalid partition value 'Adjustment' for DATE partition key: Code=2021-06-25/date=Adjustment
I can figure out something wrong happend around the partition column but unable to figure out what went wrong..Can someone help?
Partition columns should be the last ones in the select. When you add new column it is being added as the last non-partition column, partition columns remain the last ones, they are not stored in the datafiles, only metadata contains information about partitions. All other columns order also matters, it should match table DDL, check it using DESCRIBE FORMATTED table_name.
INSERT OVERWRITE table Final table PARTITION(COLUMN4, COLUMN5)
select
stg.Column1,
stg.Column2,
stg.Column3,
stg.Column6 (New column) ------------New column
stg.Column4(Partition Column) ---partition columns
stg.Column5(Partition Column)
...

DB2/400 - Create new summary table based on joining header and detail tables

I want to Create new summary table based on joining Order header and detail tables.
I am working with this code so far (not yet working). I suspect I have to define data type for MIN fields.
CREATE TABLE SUMMARY AS (
SELECT ORDHED.ORDERNO, ORDHED.CUSTNO,
COUNT(ORDHED.LINENO) AS LINECNT,
MIN((CCSTCN*1000000)+(CCSTYR*10000)+(CCSTMO*100)+CCSTDA) AS ORD_DATE,
MIN(ORDDET.CURRRENCY) AS CURRENCY
FROM ORDHED JOIN ORDDET
ON (ORDHED.ORDERNO= ORDDET.ORDERNO )
GROUP BY ORDHED.ORDERNO, ORDHED.CUSTNO
ORDER BY ORDHED.ORDERNO, ORDHED.CUSTNO
)
WITH DATA;
Simplified version, as original is 3 pages long.

SQL Server - multiple tables columns in 1 view and under 1 column header

Is it possible to do the following:
I have 2 tables called Holidays and Allocations, both of which contain a startDate and endDate field. I want to create a view which will display the startDate and endDate fields from both of these tables, but under the same column headers if possible, can this be done? or do I need to create a single table to handle this?
My theory behind using a view is that this will avoid the 1 large table storing a lot more columns, of which will contain null's where certain fields are not required.
Yes, you can do it in view by using UNION
CREATE VIEW [dbo].[ViewHolidayAllocation]
AS
SELECT
ROW_NUMBER() OVER(ORDER BY Id) AS RowNum,
*
FROM
(
SELECT Id, startDate, endDate FROM Holidays
UNION
SELECT Id, startDate, endDate FROM Allocations
) AS result
You can't have column name duplicate in view. You have to normalize db if it has sense or you have to define alias to second field.

oracle sql statement automatic transform / rewrite

We have a reference data DB that is like an ODS/MDM but it's read only. The data is updated from the authoritative systems on various schedules. Every table maintains historic data - updates do an update existing & insert new, deletes do an update existing.
All tables are of the following form:
table <name>
surrogate key,
business key(s),
attribute(s),
effective_start_date,
effective_end_date
I want to expose 2 sets of views to users/systems for querying.
First view set is views that return only the current records from the respective table. That's easy.
Second view set should provide a way to query (by joining) multiple tables (all with history) and get the effective history of the result set.
For example, if a user issues something like the following query:
select
A.busines_key,
B.business_key,
effective_start_date(),
effective_end_date()
from
A inner join B on (A.b_fk_col = B.business_key)
then I need to transform this statement into:
select
A.busines_key,
B.business_key,
max( A.effective_start_date, B.effective_start_date ) effect_start_date,
min( A.effective_end_date, B.effective_end_date ) effective_end_date
from
A inner join B on (A.b_fk_col = B.business_key)
where
(A.effective_start_date between B.effective_start_date and B.effective_end_date
or
A.effective_end_date between B.effective_start_date and B.effective_end_date)
Really, what I need to be able to do is to add a step to the query plan right after the join(s):
e.g.
Instead of the original:
SELECT STATEMENT
MERGE JOIN CARTESIAN
BUFFER SORT
TABLE ACCESS BY INDEX ID A
INDEX FULL SCAN A_B_FK_IDX
BUFFER SORT
INDEX FULL SCAN B_PK_IDX
I could get something like:
SELECT STATEMENT
**** ADDED ****
EFFECTIVE RANGES // create/modify where & select clauses
TABLE ACCESS BY INDEX ID A // get the eff dates from A
TABLE ACCESS BY INDEX ID B // get the eff dates from B
****************
MERGE JOIN CARTESIAN
BUFFER SORT
TABLE ACCESS BY INDEX ID A
INDEX FULL SCAN A_B_FK_IDX
BUFFER SORT
INDEX FULL SCAN B_PK_IDX
Any thoughts on how I could do this? Thanks.

Fetching max timestamp from multiple tables into view

I am constructing a view by joining 2 tables A and B .
The tables have TIME_OF_CHANGE column, which holds the timestamps of insertion/updation
The view should contain the max TIME_OF_CHANGE that denotes the timestamps of the recent updates on table A or B
Please, suggest a way to do it
CREATE VIEW vwReq
AS SELECT Field1,MAX(TIME_OF_CHANGE)
FROM YourTable
GROUP BY Field1;