Google Big Query - Date-Partitioned Tables with Eventual Data - google-bigquery

Our use case for BigQuery is a little unique. I want to start using Date-Partitioned Tables but our data is very much eventual. It doesn't get inserted when it occurs, but eventually when it's provided to the server. At times this can be days or even months before any data is inserted. Thus, the _PARTITION_LOAD_TIME attribute is useless to us.
My question is there a way I can specify the column that would act like the _PARTITION_LOAD_TIME argument and still have the benefits of a Date-Partitioned table? If I could emulate this manually and have BigQuery update accordingly, then I can start using Date-Partitioned tables.
Anyone have a good solution here?

You don't need create your own column.
_PARTITIONTIME pseudo column still will work for you!
The only what you will need to do is insert/load respective data batch into respective partition by referencing not just table name but rather table with partition decorator - like yourtable$20160718
This way you can load data into partition that it belong to

Related

How to get table/column usage statistics in Redshift

I want to find which tables/columns in Redshift remain unused in the database in order to do a clean-up.
I have been trying to parse the queries from the stl_query table, but it turns out this is a quite complex task for which I haven't found any library that I can use.
Anyone knows if this is somehow possible?
Thank you!
The column question is a tricky one. For table use information I'd look at stl_scan which records info about every table scan step performed by the system. Each of these is date-stamped so you will know when the table was "used". Just remember that system logging tables are pruned periodically and the data will go back for only a few days. So may need a process to view table use daily to get extended history.
I ponder the column question some more. One thought is that query ids will also be provided in stl_scan and this could help in identifying the columns used in the query text. For every query id that scans table_A search the query text for each column name of the table. Wouldn't be perfect but a start.

How to insert/update a partitioned table in Big Query

Problem statement:
I need to insert/update a few columns in a big query table that is partitioned by date.So basically I need to do the necessary changes for each partitioned date (done by day).
(its the sessions table that is created automatically by linking the GA View to BQ so I haven't done the partition manually but its automatically taken care by google).
query reference from google_docs
my query:
I also tried the below :
Can anyone help me here ? sorry I am a bit naive with BQ.
You are trying to insert into a wildcard table, a meta-table that is actually composed of multiple tables. Wildcard table is read only and cannot be inserted into.
As Hua said, ga_sessions_* is not a partitioned table, but represents many tables, each with a different suffix.
You probably want to do this then:
INSERT INTO `p.d.ga_sessions_20191125` (visitNumber, visitId)
SELECT 1, 1574

Updating partitioned and clustered table in BigQuery

I've created a partitioned and clustered BigQuery table for the time period of the year 2019, up to today. I can't seem to find if it is possible to update such a table (since I would need to add data for each new day). Is it possible to do it and if so, then how?
I've tried searching stackoverflow and BigQuery documentation for the answer. No results there on my part.
You could use the UPDATE statement to update this data. Your partitioned table will maintain their properties across all operations that modify it, like the DML and DDL statements, load jobs and copy jobs as well. For more information, you could check this document.
Hope it helps.

oracle join depth while updating table

I have a question regarding Oracle.
I know that Oracle only support the use of aliases to the first subquery level. This poses a problem when I want to group more than one time while updating a table.
Example: I have some server groups and a database containing information about them. I have one table that contains information about the groups and one table where I store with timestamp (to be exact: I used date actually) the workload of specific servers within the groups.
Now I have for performance issues a denormalized field in the server table containing the highest workload the group had within one day.
What I would like to do is something like
update server_group
set last_day_workload=avg(workload1)
from (select max(workload) workload1
from server_performance
where server_performance.server_group_ID_fk=server_group.ID
and time>sysdate-1
group by server_performance.server_group_ID_fk)
While ID is the primary key of server_group and server_group_ID_fk a foreign key reference from the server_performance table. The solution I am using so far is writing the first join into a temporary table and update from that temporary table in the next statement. Is there a better way to do this?
In this problem it isn`t such a problem yet, but if the amount of data increase using a temporary table cost not only some time, but also a notable amount of RAM.
Thank you for your answers!
If I were you, I would work out the results that I wanted in a select statement, and then use a MERGE statement to do the necessary update.

create table from query while keep original schema

I'm using the following workflow to append data to an existing BigQuery table from an external source:
query the table for the most updated record: (select max(lastModifiedData) from test.table). Save this data as 'lastMigrationTime';
query the external source for ids for records that changed since after 'lastMigrationTime'
query big Query table for all records except the updated ones: save result to test.tempTable.
move tempTable to table (using delete table,copy tempTable to table,delete tempTable).
Query external source for updated records and load them to test.table
The problem I'm facing is that the original schema of the table contains nested elements. Any query I run will flatten the schema, forcing me to flatten the original schema as well. Another side effect I saw is that column names are turned to lower case.
Is there any way to keep the original schema (mainly the nesting, but also maintaining the case would be nice)?
The column name casing issue is a known bug and should be fixed in our next release (hopefully in the next few days).
Preserving column nesting is a high-priority feature request. We're very interested in supporting this, but I don't have any time frame for when it will get done, unfortunately.