How to use env_var inside quotes in DBT Model? - sql

I have a DBT script that uses some environment variables and logs those to the table.
with source_data as (
"{{ env_var('run_id') }}"::bigint as run_id,
"{{ env_var('SNOWFLAKE_WAREHOUSE') }}"::varchar as warehouse,
"{{ env_var('SNOWFLAKE_DATABASE') }}"::varchar as db
)
select *
from source_data
but the value returned by Jinja expression is considered as column name and it throws error by saying column to present in table.
satish#000000 dbt run -m test.sql
08:28:02 Running with dbt=1.3.0
08:28:02 Found 4 models, 4 tests, 0 snapshots, 0 analyses, 327 macros, 0 operations, 0 seed files, 0 sources, 0 exposures, 0 metrics
08:28:02
08:28:06 Concurrency: 5 threads (target='dev')
08:28:06
08:28:06 1 of 1 START sql view model dbt_development.test ............................... [RUN]
08:28:07 1 of 1 ERROR creating sql view model dbt_development.test ...................... [ERROR in 1.15s]
08:28:08
08:28:08 Finished running 1 view model in 0 hours 0 minutes and 5.98 seconds (5.98s).
08:28:08
08:28:08 Completed with 1 error and 0 warnings:
08:28:08
08:28:08 Database Error in model test (models/test.sql)
08:28:08 column "20221212000000" does not exist
08:28:08 compiled Code at target/run/dbt_practice/models/test.sql
08:28:08
08:28:08 Done. PASS=0 WARN=0 ERROR=1 SKIP=0 TOTAL=1
How to correctly use env_var in select statements in DBT models?
Update:
with source_data as (
select '1' as id,
'{{ env_var("run_id") }}'::bigint as run_id,
'{{ env_var("SNOWFLAKE_WAREHOUSE") }}'::varchar as warehouse,
'{{ env_var("SNOWFLAKE_DATABASE") }}'::varchar as db
)
select *
from source_data
changing the quotes worked.

with source_data as (
select '1' as id,
'{{ env_var("run_id") }}'::bigint as run_id,
'{{ env_var("SNOWFLAKE_WAREHOUSE") }}'::varchar as warehouse,
'{{ env_var("SNOWFLAKE_DATABASE") }}'::varchar as db
)
select *
from source_data
changing the quotes worked.

could you try?
with source_data as (
"{{ env_var('run_id') | as_number }}" as run_id,
"{{ env_var('SNOWFLAKE_WAREHOUSE') | varchar }}" as warehouse,
"{{ env_var('SNOWFLAKE_DATABASE') | varchar }}" as db
)
select *
from source_data
And,
"{{ env_var('run_id') | int }}" as run_id, is another option for specifying number types.

Related

JINJA (dbt) Is it possible to make a new table with several rows based on items within a single column separated by a symbol?

I have a table with the following data structure:
ID
Tag
1
blue,red,green
2
white,blue
I would like to convert this to a new table with the following structure:
ID
Tag
1
blue
1
red
1
green
2
white
2
blue
Is this possible to do within dbt, using JINJA (or some other method)? My DWH is fully hosted within Google BigQuery and connected to dbt.
Please see Mikhail's comment for the simple, directly BQ way of doing this operation if that's all you need.
However, a more dbt generalized (jinja-sql) version of doing this such as when you know in advance you will eventually have to migrate a project from one SQL syntax to another (I had to do PL/pgSQL to Bigquery's Standard-SQL) could use database agnostic functions from the dbt-utils package like:
split_part
pivot / unpivot
get_column_values
A code sample which uses these functions could look something like:
split.sql
with source as (
select ID,Tag from {{ ref('idtag') }}
), split as (
select
s."ID",
{{ dbt_utils.split_part( s."Tag" , "','", '1') }} as Tag_1,
{{ dbt_utils.split_part( s."Tag" , "','", '2') }} as Tag_2,
{{ dbt_utils.split_part( s."Tag" , "','", '3') }} as Tag_3
from source s
)
select ID, Tag_1, Tag_2, Tag_3 from split
unpivot.sql
with unpivot as (
{{ dbt_utils.unpivot(ref('split'), cast_to='varchar', exclude=['ID']) }}
)
select "ID", "value" as split_tags
from unpivot
where "value" is not null
group by 1,2
Which for some reason, the split_part isn't working correctly on my local dbt installation right now but I hope you find some value in the intent.

BigQuery select multiple tables with different column names

Consider the following BigQuery tables schemas in my dataset my_dataset:
Table_0001: NAME (string); NUMBER (string)
Table_0002: NAME(string); NUMBER (string)
Table_0003: NAME(string); NUMBER (string)
...
Table_0865: NAME (string); CODE (string)
Table_0866: NAME(string); CODE (string)
...
I now want to union all tables using :
select * from `my_dataset.*`
However this will not yield the CODE column of the second set of table. From my understanding, the schema of the first table in the dataset will be adopted instead.
So the result with be something like:
| NAME | NUMBER |
__________________
| John | 123456 |
| Mary | 123478 |
| ... | ...... |
| Abdul | null |
| Ariel | null |
I tried to tap into the INFORMATION_SCHEMA so as to select the two sets of tables separately and then union them:
with t_code as (
select
table_name,
from my_dataset.INFORMATION_SCHEMA.COLUMNS
where column_name = 'CODE'
),
select t.NAME, t.CODE as NUMBER from `my_dataset.*` as t
where _TABLE_SUFFIX in (select * from t_code)
However, still the script will look to the first table of my_dataset for its schema and will return: Error Running Query: Name CODE not found inside t.
So now I'm at a loss: How can I union all my tables without having to union them one by one? ie. how to select CODE as NUMBER in the second set of tables.
Note: Although it seems the question was asked over here, the accepted answer did not seem to actually respond to the question (as far as I'm concerned).
The trick I see you can do is to first gather all codes by running
create table `my_another_dataset.codes` as
select * from `my_dataset.*` where not code is null
Then to do any simple fake update of any just one table with number column - this will make schema with number column default. so now you can gather all numbers
create table `my_another_dataset.numbers` as
select * from `my_dataset.*` where not number is null
Finally then you can do simple union
select * from `my_another_dataset.numbers` union all
select * from `my_another_dataset.codes`
Note: see also my comment below your question
SELECT
borrow.id AS `borrowId`,
IF(borrow.created_date IS NULL, '', borrow.created_date) AS `borrowCreatedDate`,
IF(borrow.return_date IS NULL, '', borrow.return_date) AS `borrowReturnDate`,
IF(borrow.return_date IS NULL, '0', '1') AS `borrowIsReturn`,
IF(person.card_identity IS NULL, '', person.card_identity) AS `personCardIdentity`,
IF(person.fullname IS NULL, '', person.fullname) AS `personFullname`,
IF(person.phone_number IS NULL, '', person.phone_number) AS `personPhoneNumber`,
IF(book.book_name IS NULL, '', book.book_name) AS `bookName`,
IF(book.year IS NULL, '', book.year) AS `bookYear`
FROM tbl_tbl_borrow AS borrow
LEFT JOIN tbl_person AS person
ON person.card_identity = borrow.person_card_identity
LEFT JOIN tbl_book AS book
ON book.unique_id = borrow.book_unique_id
ORDER BY
borrow.return_date ASC, person.fullname ASC;

Teradata - Running query passed as parameter

I am trying to learn Teradata and trying to simplify the way we copy data from production DB to testing DB for testing.
For this process we need to fill up an excel with below details and send to our TDBA:-
Prod Table DbName
Prod Table Name
Prod Table Perm_Size
Prod Table GB Size
Prod Table Record Count
Filter SQL Query to fetch data From Prod DB
Filtered Record Count
I was trying to create a simple SQL utility that accepts parameters (Italics and Bold in the list above) and output the remaining fields.
I kind of started but got stuck in creating and running queries passed as parameters. I also tried using parameters as '?dbName' etc to accept values at run time. But wasn't able to fix that too. Any guidance would be great.
WITH ParamInp(dbName, tblName, fltrQry) AS
(SELECT 'PRDDB', 'EMPL', 'SELECT * FROM PRDVIEWS.EMPL WHERE ID IN (1,2,3)') -- We have select access only on PRDVIEWS schema
SELECT
Upper(Trim(ParamInp.dbName)) AS DATABASENAME,
Substr(Upper(Trim(ParamInp.dbName)), 1, Length(Trim(ParamInp.dbName))-2) || 'VIEWS' AS VIEWNAME, -- Creating View DB schema name
Upper(Trim(ParamInp.tblName)) AS TABLENAME,
fltrQry AS FILTER_QUERY, -- do not want to execute fltrQry here. It is only to include in the excel
Sum(currentperm) AS PERM_SIZE,
Sum(currentperm)/1024**3 AS TOTAL_SIZE, -- GigaByte
(SELECT Cast(Count(*) AS BIGINT) FROM (Substr(Upper(Trim(dbName)), 1, Length(Trim(dbName))-2) || 'VIEWS').Upper(Trim(tblName)) )
AS TOTAL_COUNT, -- Unable to get this working
(SELECT Cast(Count(*) AS BIGINT) FROM (ParamInp.fltrQry))
AS FILTERED_COUNT -- This is where fltrQry should run
FROM dbc.allspace, ParamInp
WHERE TABLENAME = ParamInp.tblName
AND databasename = ParamInp.dbName
GROUP BY 1,2,3
ORDER BY 1,2;
I think I will not be able to do it in one query. In that case, how should I approach this. I run my queries in Teradata SQL assistant and sometimes get lists of tables to be loaded from production.
I expect the output as
DATABASENAME | VIEWNAME | TABLENAME | FILTER_QUERY | PERM_SIZE | TOTAL_SIZE | TOTAL_COUNT | FILTERED_COUNT |
--------------------------------------------------------------------------------------------------------------------------
PRDDB | PRDVIEWS | EMPL | SELECT * FROM PRDVIEWS.EMPL WHERE ID IN (1,2,3) | 1111111 | 2.2 | 333333 | 444 |
Other then Stored procedure you may do it with a two step approach in bteq.
with a select concatenate the needed sql-command(s), inserting the parameters where needed.
the result is exported into a file
run these created sql-commands from the file
I did not test the following, it is intended to show the general idea.
I am sure some additional tweaking is necessary to get the syntax of generated command(s) right.
.logon tdpid/user,pass
.set format off
.set titledashes off
.export file /tmp/myQuery.bteq
select 'SELECT
Upper(Trim('||dbName||')) AS DATABASENAME,
Substr(Upper(Trim('||dbName||')), 1, Length(Trim('||dbName||'))-2) || 'VIEWS AS VIEWNAME,
Upper(Trim('||tblName||')) AS TABLENAME,
'||fltrQry||' AS FILTER_QUERY,
Sum(currentperm) AS PERM_SIZE,
Sum(currentperm)/1024**3 AS TOTAL_SIZE,
(SELECT Cast(Count(*) AS BIGINT) FROM (Substr(Upper(Trim('||dbName||')), 1, Length(Trim('||dbName||'))-2) || 'VIEWS).Upper(Trim('||tblName||')) )
AS TOTAL_COUNT,
(SELECT Cast(Count(*) AS BIGINT) FROM ('||fltrQry||'))
AS FILTERED_COUNT
FROM dbc.allspace
WHERE TABLENAME = '||tblName||'
AND databasename = '||dbName||'
GROUP BY 1,2,3
ORDER BY 1,2;' (TITLE '')
from (
SELECT 'PRDDB' as dbName, 'EMPL' as tblName, 'SELECT * FROM PRDVIEWS.EMPL WHERE ID IN (1,2,3)' as fltrQry
) as commands;
.export reset
.run file = /tmp/myQuery.bteq

Does anybody knows How to get a table's hdfs directory with select statement in hive environment

I used use a select statement like 'select T** from xxx' to get a table's hdfs directory location in hive. But now I forgot the statement. Does anyone know it! Thanks!
I think you need DESCRIBE formatted.
The desired location:
Location: file:/tmp/warehouse/part_table/d=abc
DEMO
hive> DESCRIBE formatted part_table partition (d='abc');
OK
# col_name data_type comment
i int
# Partition Information
# col_name data_type comment
d string
# Detailed Partition Information
Partition Value: [abc]
Database: default
Table: part_table
CreateTime: Wed Mar 30 16:57:14 PDT 2016
LastAccessTime: UNKNOWN
Protect Mode: None
####### HERE IS THE LOCATION YOU WANT ########
Location: file:/tmp/warehouse/part_table/d=abc
Partition Parameters:
COLUMN_STATS_ACCURATE true
numFiles 1
numRows 1
rawDataSize 1
totalSize 2
transient_lastDdlTime 1459382234
# Storage Information
SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat: org.apache.hadoop.mapred.TextInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Compressed: No
Num Buckets: -1
Bucket Columns: []
Sort Columns: []
Storage Desc Params:
serialization.format 1
Time taken: 0.334 seconds, Fetched: 35 row(s)

Postgres DB Size Command

What is the command to find the size of all the databases?
I am able to find the size of a specific database by using following command:
select pg_database_size('databaseName');
You can enter the following psql meta-command to get some details about a specified database, including its size:
\l+ <database_name>
And to get sizes of all databases (that you can connect to):
\l+
You can get the names of all the databases that you can connect to from the "pg_datbase" system table. Just apply the function to the names, as below.
select t1.datname AS db_name,
pg_size_pretty(pg_database_size(t1.datname)) as db_size
from pg_database t1
order by pg_database_size(t1.datname) desc;
If you intend the output to be consumed by a machine instead of a human, you can cut the pg_size_pretty() function.
-- Database Size
SELECT pg_size_pretty(pg_database_size('Database Name'));
-- Table Size
SELECT pg_size_pretty(pg_relation_size('table_name'));
Based on the answer here by #Hendy Irawan
Show database sizes:
\l+
e.g.
=> \l+
berbatik_prd_commerce | berbatik_prd | UTF8 | en_US.UTF-8 | en_US.UTF-8 | | 19 MB | pg_default |
berbatik_stg_commerce | berbatik_stg | UTF8 | en_US.UTF-8 | en_US.UTF-8 | | 8633 kB | pg_default |
bursasajadah_prd | bursasajadah_prd | UTF8 | en_US.UTF-8 | en_US.UTF-8 | | 1122 MB | pg_default |
Show table sizes:
\d+
e.g.
=> \d+
public | tuneeca_prd | table | tomcat | 8192 bytes |
public | tuneeca_stg | table | tomcat | 1464 kB |
Only works in psql.
Yes, there is a command to find the size of a database in Postgres. It's the following:
SELECT pg_database.datname as "database_name", pg_size_pretty(pg_database_size(pg_database.datname)) AS size_in_mb FROM pg_database ORDER by size_in_mb DESC;
SELECT pg_size_pretty(pg_database_size('name of database'));
Will give you the total size of a particular database however I don't think you can do all databases within a server.
However you could do this...
DO
$$
DECLARE
r RECORD;
db_size TEXT;
BEGIN
FOR r in
SELECT datname FROM pg_database
WHERE datistemplate = false
LOOP
db_size:= (SELECT pg_size_pretty(pg_database_size(r.datname)));
RAISE NOTICE 'Database:% , Size:%', r.datname , db_size;
END LOOP;
END;
$$
From the PostgreSQL wiki.
NOTE: Databases to which the user cannot connect are sorted as if they were infinite size.
SELECT d.datname AS Name, pg_catalog.pg_get_userbyid(d.datdba) AS Owner,
CASE WHEN pg_catalog.has_database_privilege(d.datname, 'CONNECT')
THEN pg_catalog.pg_size_pretty(pg_catalog.pg_database_size(d.datname))
ELSE 'No Access'
END AS Size
FROM pg_catalog.pg_database d
ORDER BY
CASE WHEN pg_catalog.has_database_privilege(d.datname, 'CONNECT')
THEN pg_catalog.pg_database_size(d.datname)
ELSE NULL
END DESC -- nulls first
LIMIT 20
The page also has snippets for finding the size of your biggest relations and largest tables.
Start pgAdmin, connect to the server, click on the database name, and select the statistics tab. You will see the size of the database at the bottom of the list.
Then if you click on another database, it stays on the statistics tab so you can easily see many database sizes without much effort. If you open the table list, it shows all tables and their sizes.
You can use below query to find the size of all databases of PostgreSQL.
Reference is taken from this blog.
SELECT
datname AS DatabaseName
,pg_catalog.pg_get_userbyid(datdba) AS OwnerName
,CASE
WHEN pg_catalog.has_database_privilege(datname, 'CONNECT')
THEN pg_catalog.pg_size_pretty(pg_catalog.pg_database_size(datname))
ELSE 'No Access For You'
END AS DatabaseSize
FROM pg_catalog.pg_database
ORDER BY
CASE
WHEN pg_catalog.has_database_privilege(datname, 'CONNECT')
THEN pg_catalog.pg_database_size(datname)
ELSE NULL
END DESC;
du -k /var/lib/postgresql/ |sort -n |tail