We're using DBT to manage our data pipeline. We're also using postgres as our db. I'm creating some materialized views through a query (not in dbt) and it looks like whenever we run dbt run --full-refresh it drops those materialized views. Any idea why, and how to not drop the materialized views?
This answer came from Claire at DBT.
"If the materialized views depend on the upstream table, they will get dropped by the drop table my_table cascade statement"
This came from Jake at DBT.
"postgres views/materialized views are binding. There’s no opt-out and recreating them even in the same dbt run will result in periods where it’s not available."
https://www.postgresql.org/docs/9.3/rules-materializedviews.html
https://docs.getdbt.com/
As the previous answer stated, materialised views are dropped when a table is dropped because of the cascade.
A bridge towards getting higher uptime is to have tables which act as copies of the dbt tables which are being rebuilt which are then deleted and updated on rebuild.
The downtime whilst the tables are being rebuilt is probably worth the deterministic behaviour of knowing when the tables are being rebuilt, rather than having the materialised views disappearing during a long rebuild.
This is the macro I wrote to solve this problem. It creates a new table with a slightly different name within a single transaction, allowing for 100% uptime.
{% macro create_table(table_name) %}
{% set sql %}
BEGIN;
DROP TABLE IF EXISTS {{ table_name[:-4]}};
CREATE TABLE {{ table_name[:-4]}} AS SELECT * FROM {{ table_name }};
COMMIT;
{% endset %}
{% do run_query(sql) %}
{% do log(table_name+" table created", info=True) %}
{% endmacro %}
Related
I have a dbt_project.yml like:
name: rdb
profile: rdb
source-paths: ['models']
version: "0.1"
models:
rdb:
schema: cin
materialized: table
post-hook: 'grant select on {{ this }} to rer'
on-run-end:
# TODO: fix
- 'grant usage on schema "{{ target.schema }}" to rer'
DBT has worked really well. But with the on-run-end entry, it fails with Compilation Error 'target' is undefined. With that line commented out, it works fine.
Am I making a basic mistake? Thanks!
Your post-hook should actually look like this:
on-run-end:
- "{% for schema in schemas %}grant usage on schema {{ schema }} to rer;{% endfor %}"
The dbt docs for the on-run-end context explain this in detail, but long story short: because a dbt run may touch tables in different schemas on the target database, there is no single target.schema value to which you can apply the grant statement. Instead, the context provides you with a list of schema names called schemas that you need to loop through. That list has one or more elements.
The target in dbt is the configuration data for the adapter, like account, user, port, or schema. this is about the database object that is being written, and also includes a field schema. Finally, the on-run-end context provides the list of schemas, so that you are not forced to make redundant grant statements for each table or view, but can make just a single grant for each schema.
My hunch is you do not need to quote the jinja template. Try:
on-run-end:
- 'grant usage on schema {{ target.schema }} to rer'
See this for reference.
In the project I have been recently working on, many (PostgreSQL) database tables are just used as big lookup arrays. We have several background worker services, which periodically pull the latest data from a server, then replace all contents of a table with the latest data. The replacing has to be atomic because we don't want a partially completed table to be seen by lookup-ers.
I thought the simplest way to do the replacing is something like this:
BEGIN;
DELETE FROM some_table;
COPY some_table FROM 'source file';
COMMIT;
But I found a lot of production code use this method instead:
BEGIN;
CREATE TABLE some_table_tmp (LIKE some_table);
COPY some_table_tmp FROM 'source file';
DROP TABLE some_table;
ALTER TABLE some_table_tmp RENAME TO some_table;
COMMIT;
(I omit some logic such as change the owner of a sequence, etc.)
I just can't see any advantage of this method. Especially after some discoveries and experiments. SQL statements like ALTER TABLE and DROP TABLE acquire an ACCESS EXCLUSIVE lock, which even blocks a SELECT.
Can anyone explain what problem the latter SQL pattern is trying to solve? Or it's wrong and we should avoid using it?
I have three sites,one site contians the employees table, while the other sites have materialized view of employees table .
This is how i created the materialized views on the other sites.
CREATE MATERIALIZED VIEW employeesMV
REFRESH FAST
FOR UPDATE
AS
SELECT * FROM manager.employees#managerlink;
so i just want to know how to update the master table employees after i make changes such as (insert or update) on the materialized view.
Thank you in advance.
By default, materialized view can't be updated. However, if you use FOR UPDATE clause, you can do it, but those changes aren't reflected in MV's source table. Moreover, as soon as you refresh the MV, changes you've made will be lost.
Advanced replication covers it, but Oracle says that it is deprecated in 12cR1.
There's a walkthrough on Vinayaga Consultancy's blog, Updatable Materialized View, based on Oracle 11.2 (source) and 10.2 (target database) so - have a look. It isn't that trivial at all.
I have a table T_SG_LTA_TRANSACTION_TYPE in source database.
I want to move it into a target database.
I have created a materialized view log in source database.
CREATE MATERIALIZED VIEW LOG ON T_SG_LTA_TRANSACTION_TYPE WITH PRIMARY KEY, ROWID;
Then I created materialized view in target database with following query.
CREATE MATERIALIZED VIEW T_SG_LTA_TRANSACTION_TYPE
ON PREBUILT TABLE
REFRESH FAST ON DEMAND
FOR UPDATE
AS
SELECT TRANSACTION_ID,
TRANSACTION_DESCRIPTION,
FILE_TYPE_ID
FROM T_SG_LTA_TRANSACTION_TYPE#EBAODWH_SRC_1_GS_AIG;
But when I refresh materialized view , I am unable to load the data which is already present in T_SG_LTA_TRANSACTION_TYPE(SOURCE DB).
BEGIN
DBMS_MVIEW.refresh('T_SG_LTA_TRANSACTION_TYPE');
END;
The data which is updated in source table after creation of materialized view, is only loading to materialized view . But I want to get whole data from source table(modified and unmodified) into materialized view. And I need this unmodified data only once when mview is created. Please suggest the solution. Thanks in advance.
You seem to be using the ON PREBUILT TABLE clause incorrectly:
The ON PREBUILT TABLE clause lets you register an existing table as a preinitialized materialized view.
And
Caution:
This clause assumes that the table object reflects the materialization of a subquery. Oracle strongly recommends that you ensure that this assumption is true in order to ensure that the materialized view correctly reflects the data in its master tables.
You're essentially saying that T_SG_LTA_TRANSACTION_TYPE already exists on the target database - not the source - and contains the current state of the source table; which, since you're missing data, is not true.
When you refresh Oracle is only looking for changes since the view was created, as it's supposed to; it relies on the MV log to identify what has changed.
Drop the MV in the target database, without the preserve table clause, and then recreate it without the on prebuilt table clause.
Make sure you're dropping the right thing in the right database/schema, of course...
It isn't clear if the (empty) table already existed in your target DB before you started; or you've run this a couple of times and dropped the MV with preserve table - and either the source was empty last time, or you truncated the target table afterwards - or perhaps you tried to export/import the initial state but just got the metadata and not the data. If the table you said existed on the target DB did not, in fact, exist then Oracle would have thrown an ORA-12059 exception when you tried to create the MV. I suspect you'd created an empty table and then tried to convert it to an MV, but if you do that it won't get any data from the source DB, as you've seen.
I have a staging table (stage_enrolments) and a production table (enrolments). The staging table isn't partitioned, the production table is. I'm trying to use the ALTER TABLE SWITCH statement to transfer the records in the staging table to production.
ALTER TABLE dbo.stage_enrolments
SWITCH TO dbo.enrolments PARTITION #partition_num;
However, when I execute this statement I get the following error:
ALTER TABLE SWITCH statement failed. Target table 'Academic.dbo.enrolments' is referenced by 1 indexed view(s), but source table 'Academic.dbo.stage_enrolments' is only referenced by 0 matching indexed view(s)
I have the same indexed view defined on dbo.stage_enrolments as I do on dbo.enrolments - although the view on enrolments is partitioned. I've tried recreating the views and their indexes checking that all options are the same but I get the same result. If I remove the index from the dbo.enrolments view then it works fine.
I have it working on another set of tables that have indexed views so I'm not sure why it isn't working for these. Does anyone have an idea as to why this may be occurring? What else should I check?
The problem has now been sorted. I've recreated the indexed view once again and it is now working. I haven't actually changed anything though other than the name of the index so I'm not sure what the problem was.