dbt docs generate failing on circleci - dbt

I see this error:
dbt docs generate --profiles-dir ***** --project-dir *****
Running with dbt=0.20.2
[WARNING]: Configuration paths exist in your dbt_project.yml file which do not apply to any resources.
There are 4 unused configuration paths:
- models.data_vault.raw_vault.link
- models.data_vault.raw_vault.sat
- models.data_vault.raw_vault.t_link
- models.data_vault.business_vault
Found 30 models, 7 tests, 0 snapshots, 0 analyses, 461 macros, 0 operations, 0 seed files, 28 sources, 0 exposures
ERROR: Database Error
timeout expired
make: *** [Makefile:36: docs-circle] Error 1
What is the database error? Why is it timing out? Why is a database needed here for dbt docs generate?

What is the database error?
It is a timeout error
Why is it timing out?
No way we can know that without knowing details about your db or configurations and network.
Why is a database needed here for dbt docs generate?
The dbt docs generate command will go and run a bunch of SQL against your database to get metadata about your sources. For example, if your target/profile is Redshift, it'll go query rowcount, table size, sort keys, dist style, stats, etc. from the system tables.

Related

Issues while using Ora2pg for Oracle to Postgresql - report generation

I am trying to generate the report from OracleDB --19c with ora2pg --V23.1.
Command Used: ora2pg -t show_report --dump_as_html -l db_report_filename.html -c E:\ora2pg\ora2pg.cong
Error generated in html report:
FATAL: ORA-00604: error occurred at recursive SQL level 1 ORA-08177: can't serialize access for this transaction (DBD ERROR: OCIStmtExecute)
Looking for ideas to resolve this issue.
This issue was fixed when a configuration change in ora2pg conf file was changed
Data are exported in serialized transaction mode to have a consistent snapshot of the data, see Oracle documentation about what parameter to increase to not have this issue. Or if you are sure that no modification are done in the Oracle database you can force Ora2Pg to use a readonly transaction instead, see TRANSACTION directive in ora2pg.conf

'DBT docs generate' does not populate model column-level comments in the catalog

I use dbt-snowflake 1.1.0 with the corresponding dbt-core 1.1.0.
I added documentation for my models in yaml files, i.e.:
> models/stage/docs.yml
version: 2
models:
- name: raw_weblogs
description: Web logs of customer interaction, broken down by attribute (bronze). The breakdown is performed using regular expressions.
columns:
- name: ip
description: IP address from which the request reached the server (might be direct customer IP or the address of a VPN/proxy).
...
Although these details show up correctly in the DBT UI when i run dbt docs generate and then dbt docs serve, yet they are not listed in target/catalog.json:
cat target/catalog.json | grep identity
(no results)
According to the DBT documentation, I understand that column comments should be part of catalog.json.
LATER EDIT: I tried running dbt --debug docs generate and it seems that all data is retrieved directly from the target environment (in my case, Snowflake). Looking at the columns of my model in Snowflake, they indeed do NOT have any comments posted on the in Snowflake.
It thus seems to me that the underlying error might be with the fact that dbt run does not correctly persist the column metadata to Snowflake.
After further investigation, I found out the reason for lacking comments was indeed the fact that the comments are written to catalog.json when running dbt docs generate based on what is received from the database, while dbt docs serve populates the UI by combining information from catalog.json with metadata (in my case, documented column comments) from the local dbt models.
The solution to persist such metadata in the database with dbt run was to add the following DBT configuration:
> dbt_project.yml
models:
<project>:
...
+persist_docs:
relation: true
columns: true

Gcloud SQL Postgres import error : CREATE TABLE ERROR: syntax error at or near "AS" LINE 2: AS integer ^ Import error: exit status 3**

Problem:
Getting below mentioned error while importing schema from AWS Postgres to Gcloud postgres.
Error:
Import failed:
SET
SET
SET
SET
SET set_config
------------
(1 row)
SET
SET
SET
CREATE SCHEMA
SET
SET
CREATE TABLE
ERROR: syntax error at or near "AS" LINE 2: AS integer ^
Import error: exit status 3
I used --no-acl --no-owner --format=plain while exporting data from AWS postgres
pg_dump -Fc -n <schema_name> -h hostname -U user -d database --no-acl --no-owner --format=plain -f data.dump
I am able to import certain schemas in gcloud sql exported using same method but getting error for some other similar schemas. Table has geospatial info and postgis is already installed in destination database.
Looking for some quick help here.
My solution:
Basically, I had a data dump file from postgres 10.0 with tables having 'sequence' for PK . Apparently, the way sequences along with other table data got dumped in file, was not been read properly by Gcloud postgres 9.6. That's where it was giving error "AS integer". Also, finally I did find this express in dump file which I couldn't find earlier. Hence I need to filter out this bit.
CREATE SEQUENCE sample.geofences_id_seq
AS integer <=====had to filter out this bit to get it working
START WITH 1
INCREMENT BY 1
NO MINVALUE
NO MAXVALUE
CACHE 1;
No sure if anyone else faced this issue but i had and this solution worked for me without loosing any functionality.
Happy to get other better solutions here.
The original answer is correct, and similar answers are given for the general case. Options include:
Upgrading the target database to 10: this depends on what you are using in GCP. For a managed service like Cloud SQL, upgrading is not an option (though support for 10 is in the works, so waiting may be an option in some cases). It is, if you are running the database inside a Compute instance, or as a container in, e.g., App Engine (a ready instance is available from the Marketplace).
Downgrading the source before exporting. Only possible if you control the source installation.
Removing all instances of this one line from the file before uploading it. Adapting other responses to modify an already-created dump file, the following worked for me:
cat dump10.sql | sed -e '/AS integer/d' > dump96.sql

Error starting cluster: Catalog was not initialized in expected time period

Currently I encountered the following error while starting Impala cluster.
Command:
$ ./start-impala-cluster.py --verbose
Output:
...
Waiting for Catalog... Status: 1 DBs / 0 tables (ready=False)
Waiting for Catalog... Status: 1 DBs / 0 tables (ready=False)
Error starting cluster: Catalog was not initialized in expected time period.
When I opened start-implaa-cluster.py, the metric value for 'catalog.num-tables' was always zero. May I know how I could deeply look into and fix the issue?
I referred the "Building Impala" document: https://cwiki.apache.org/confluence/display/IMPALA/Building+Impala
I am using CentOS 7 now.
Thanks,
Jinchul
I found a solution by myself :)
The catalog information should be aligned with Hive Metastore.
It means Impala may not connect Hive metastore. I could find a clue from log files under ${IMPALA_HOME}/logs/cluster.
As for configuration files,
Check /etc/impala/conf if you install Impala via CDH.
Check ${IMPALA_HOME}/fe/src/test/resoucers if you build and install Impala using source code.
For your information, Cloudera Impala user guide definitely gave me good advise to understand how it could work. Please refer the link or do googling with the keywords {cloudera + impala + pdf}
https://www.cloudera.com/documentation/enterprise/5-5-x/PDF/cloudera-impala.pdf
Thanks,
Jinchul

Pentaho not retaining the log and temp files

I am running a pentaho ETL kettle transformation(.ktr) to load data from a source db2 database into a destination netezza database.
When I run the transformation, I specify the directory to store the log files and temp .txt files. But after the transformation finishes, these files are no longer there, so I guess pentaho is cleaning them up. IS there a way to retain these files?
The other problem is that I am getting a sql exception while the transformation step is inserting into netezza like this:
error
2013/10/30 14:13:17 - Load XXX_TABLE_NAME - ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) : at org.netezza.internal.QueryExecutor.getNextResult(QueryExecutor.java:279)
No further details are there. How can I troubleshoot this?
That seems like an issue with pentaho. Is there no way to generate a trace of what it's doing in the transformation ? are you sure it's reading data ? what happens if the target is not netezza ?
If you've got access to the netezza appliance, there are a few options, all in the documentation. off the top of my head:
look in the current queries view while it's running
enable query history logging (requires admin access + restarting the instance)
check the pg.log file in /nz/kit/log/postgres/ (logs all queries by default)