Syntax error unexpected 'with' while using dbt_utils.date_spine

Syntax error unexpected 'with' while using dbt_utils.date_spine - dbt

I am trying to use the dbt_utils.date_spine macro :
select
{{ dbt_utils.date_spine(datepart="day", start_date="cast('2019-01-01' as date)", end_date="cast('2020-01-01' as date)") }} as purchase_date
from table(generator(rowcount=>10))
And get the following error :
Database Error in model purchase (models/data_generation/purchase.sql)
001003 (42000): SQL compilation error:
syntax error line 21 at position 0 unexpected 'with'.
syntax error line 29 at position 5 unexpected ','.
compiled SQL at target........purchase.sql
Has anyone seen this before?

I'm afraid that's not the right way to use/call this macro.
One of the most common options is to materialize a table somewhere and then have other models referencing that table. You can think about that as a Calendar Table or if you want to go further down you can build your own Date Dimension.
For example, let's say you have a calendar_table model defined as:
{{
config(
materialized = 'table',
)
}}
{{ dbt_utils.date_spine(
datepart="day",
start_date="to_date('01/01/2020', 'mm/dd/yyyy')",
end_date="to_date('01/01/2027', 'mm/dd/yyyy')"
)
}}
Once built the model in the data warehouse, then you can reference it in other models, like:
-- another_model.sql
select * from {{ ref('calendar_table') }}
You can also materialize the calendar_table model as ephemeral in case you don't want to build it in the DW.
But in case you don't want to have a separate model for that, then you can use with a CTE, for example:
with date_spine as (
{{- dbt_utils.date_spine(
datepart="day",
start_date="to_date('01/01/2020', 'mm/dd/yyyy')",
end_date="to_date('01/01/2027', 'mm/dd/yyyy')"
)
-}}
)
, other_cte as (
...
)
...
Note that the code generated from the macro is not friendly to be used in a view materialization. You might prefer to use table for that.
Here are some related posts from dbt discourse:
https://discourse.getdbt.com/t/date-dimensions/735
https://discourse.getdbt.com/t/building-a-calendar-table-using-dbt/325

Thanks gasscoelho, How do I add the generated date_spine as a new column in a table e.g. as purchase date in purchase table? This is what I tried.
with
purchase_date_spine as (
{{- dbt_utils.date_spine(
datepart="day",
start_date="to_date('01/01/2010', 'mm/dd/yyyy')",
end_date="to_date('12/31/2020', 'mm/dd/yyyy')"
)
-}}
),
purchase as (
select
{{ var('purchase_id_start') }} + row_number() over(order by random()) as purchase_id, --primary_key for the table
-- row_number| https://docs.snowflake.com/en/sql-reference/functions/row_number.html#row-number
uniform({{ var('account_id_start') }}, {{ var('account_id_start') }} + {{ var('account_rows') }}, random()) as account_id, --foreign key
-- uniform | https://docs.snowflake.com/en/sql-reference/functions/uniform.html#uniform
uniform({{ var('product_id_start') }}, {{ var('product_id_start') }} + {{ var('product_rows') }}, random()) as product_id, --foreign key
{{ ref('purchase_date_spine') }} as purchase_date
from table(generator(rowcount=>{{ var('purchase_rows') }}))
)
select * from purchase```

I was able to solve this by joining the date_spine with the purchase table. Posting the solution below for completeness.
purchase_date_spine.sql
{{- dbt_utils.date_spine(
datepart="day",
start_date="to_date('01/01/2010', 'mm/dd/yyyy')",
end_date="to_date('12/31/2020', 'mm/dd/yyyy')"
)
-}}
purchase.sql
{{-
with
purchase_date_stub as (
select
date_day as purchase_date,
row_number() over(order by date_day) as row_id
from {{ ref('purchase_date_spine') }} sample({{ var('purchase_rows') }} rows)
),
purchase_stub as (
select
{{ var('purchase_id_start') }} + row_number() over(order by random()) as purchase_id, --primary_key for the table
-- row_number| https://docs.snowflake.com/en/sql-reference/functions/row_number.html#row-number
uniform({{ var('account_id_start') }}, {{ var('account_id_start') }} + {{ var('account_rows') }}, random()) as account_id, --foreign key
-- uniform | https://docs.snowflake.com/en/sql-reference/functions/uniform.html#uniform
uniform({{ var('product_id_start') }}, {{ var('product_id_start') }} + {{ var('product_rows') }}, random()) as product_id, --foreign key
row_number() over(order by random()) as row_id
from table(generator(rowcount=>{{ var('purchase_rows') }}))
)
select
purchase_stub.purchase_id,
purchase_stub.account_id,
purchase_stub.product_id,
purchase_date_stub.purchase_date
from purchase_stub
join purchase_date_stub on purchase_stub.row_id = purchase_date_stub.row_id
-}}

Related

Hybris Flexible search union query to fetch products

code Attribute1(String)
A C
B D
C Empty
D Empty
how to get the pk's of all A,B,C,D
Note: Using the string value C,D I want to fetch pk of product C,D along with A,B using Flexible search query
Details :
I Have list a of product's.
Inside each of these products there is an attribute called "X" which contains product ID code of type string.
Note : "Product ID Code" means "Product ID" of another product inside the list of products.
Now i want to get pk's of products based upon the Product ID Codes?

I don't understand it fully, but you can try something like this
select {p1.pk},{p2.pk} from {product as p1},{product as p2} WHERE {p1.Attribute1} = {p2.code}
you can add filter to it
AND {p1.Attribute1} in ('C','D')
Using UNION
SELECT uniontable.PK FROM
(
{{
SELECT {p1:PK} AS PK FROM {Product AS p1},{Product AS p2}
WHERE {p1.code} = {p2.Attribute1}
}}
UNION ALL
{{
SELECT {p:PK} AS PK FROM {Product AS p}
WHERE {p1.Attribute1} is not empty
}}
) uniontable
With filter
SELECT uniontable.PK FROM
(
{{
SELECT {p1:PK} AS PK FROM {Product AS p1},{Product AS p2}
WHERE {p1.code} = {p2.Attribute1} AND {p2.Attribute1} in ('C','D')
}}
UNION ALL
{{
SELECT {p:PK} AS PK FROM {Product AS p}
WHERE {p.Attribute1} in ('C','D')
}}
) uniontable

Replacing IN clause by EXISTS causes unexpected results

Given a simple table lieu_horaire with :
+ id_horaire (numeric)
+ id_lieu (numeric)
+ horaire (timestamp)
Following query works fine at the moment :
DELETE FROM lieu_horaire where id_horaire IN (
SELECT id_horaire "+
FROM (
SELECT id_horaire,
ROW_NUMBER() OVER (PARTITION BY id_lieu order by horaire desc) AS line_number
FROM lieu_horaire
) as sr
WHERE sr.line_number > 10);
But due to scalability reasons, I would like to replace IN clause by EXISTS '... WHERE EXISTS(... '. But with exists, instead of deleting the correct lines, it removes all the lines as if '... WHERE sr.line_number...' expression were not present.

Make sure you have joined the column while using exists , something like this:
DELETE FROM lieu_horaire where exists (
SELECT id_horaire
FROM (
SELECT id_horaire,
ROW_NUMBER() OVER (PARTITION BY id_lieu order by horaire desc) AS line_number
FROM lieu_horaire
) as sr
WHERE sr.line_number > 10 **and lieu_horaire.id_horaire=sr.id_horaire**);

Here is a simplification using NOT IN:
DELETE FROM lieu_horaire lh
WHERE lh.id_horaire NOT IN (SELECT lh2.id_horaire
FROM lieu_horaire lh2
WHERE lh2.id_lieu = lh.id_lieu
ORDER BY lh2.horaire DESC
LIMIT 10
);
However, I don't know that the performance will be much better. For either version, you want an index on lieu_horaire(id_lieu, horaire).
If a large number of rows were to be deleted, I might suggest truncate/insert instead.

How to return max(ID) from any of 10 columns based on Import_Timestamp

I have a table:
> DFL_ID (PK number)
>
> IMPORT_TIMESTAMP
>
> DSS_ID_01 (number)
>
> FILENAME_01 (varchar)
>
> DSS_ID_02
>
> FILENAME_02 (varchar)
>
> DSS_ID_03
>
> FILENAME_03(varchar)
>
> ...
>
> DSS_ID_10
>
> FILENAME_10 (varchar)
The ID columns of the 10 DSS_ID's columns are keys to records in another table. The same ID can be in any of the 10 columns in different records but can NOT be repeated in the same record. (Each DSS_ID is a partition in the DSS table aswell)
e.g
DFL_ID, IMPORT_TIMESTAMP, DSS_ID_01, FILENAME_01, DSS_ID_02, FILENAME_02
1, 07-DEC-15 10.50.56.933317000, 8650, a.csv, 8652, b.csv
2, 26-NOV-15 10.45.38.651502000, 8000, c.csv, 8650, d.csv
I want to be able to return:
DSS_ID, DFL_ID, FILENAME
8000, 2, c.csv
8650, 1, a.csv
8652, 1, b.csv
I think I need to use something like
where MAX(DSS_ID) keep (dense_rank last order by import_timestamp) OVER (PARTITION BY DSS_ID) but I must admit I'm really confused.
Any ideas would be great, thanks.

You should fix your data structure so the values are in separate rows not columns.
It would appear that you want something like this:
select dss_id,
max(dfl_id) keep (dense_rank first order by import_timestamp desc) as dfl_id,
max(filename) keep (dense_rank first order by import_timestamp desc) as filename
from ((select dss_id_1 as dss_id, dfl_id, file_name_01 as file_name, timestamp
from t
) union all
(select dss_id_2 as dss_id, dfl_id, file_name_02 as file_name, timestamp
from t
) union all
. . .
) d
group by dss_id;

SQL to Paginate Data Where Pagination Starts at a Given Primary Key

Edit: The original example I used had an int for the primary key when in fact my primary key is a var char containing a UUID as a string. I've updated the question below to reflect this.
Caveat: Solution must work on postgres.
Issue: I can easily paginate data when starting from a known page number or index into the list of results to paginate but how can this be done if all I know is the primary key of the row to start from. Example say my table has this data
TABLE: article
======================================
id categories content
--------------------------------------
B7F79F47 local a
6cb80450 local b
563313df local c
9205AE5A local d
E88F7520 national e
5ab669a5 local f
fb047cf6 local g
591c6b50 national h
======================================
Given an article primary key of '9205AE5A' (article.id == '9205AE5A') and categories column must contain 'local' what sql can I use to return a result set that includes the articles either side of this one if it was paginated i.e. the returned result should contain 3 items (previous, current, next articles)
('563313df','local','c'),('9205AE5A','local','d'),('5ab669a5','local','f')
Here is my example setup:
-- setup test table and some dummy data
create table article (
id varchar(36),
categories varchar(256),
content varchar(256)
)
insert into article values
('B7F79F47', 'local', 'a'),
('6cb80450', 'local', 'b'),
('563313df', 'local', 'c'),
('9205AE5A', 'local', 'd'),
('E88F7520', 'national', 'e'),
('5ab669a5', 'local', 'f'),
('fb047cf6', 'local', 'g'),
('591c6b50', 'national', 'h');
I want to paginate the rows in the article table but the starting point I have is the 'id' of an article. In order to provide a "Previous Article" and "Next Article" links on the rendered page I also need the articles that would come either side of this article I know the id of
On the server side I could run my pagination sql and iterate through each result set to find the index of the given item. See the following inefficient pseudo code / sql to do this:
page = 0;
resultsPerPage = 10;
articleIndex = 0;
do {
resultSet = select * from article where categories like '%local%' limit resultsPerPage offset (page * resultsPerPage) order by content;
for (result in resultSet) {
if (result.id == '9205AE5A') {
// we have found the articles index ('articleIndex') in the paginated list.
// Now we can do a normal pagination to return the list of 3 items starting at the article prior to the one found
return select * from article where categories like '%local%' limit 3 offset (articleIndex - 1);
}
articleIndex++;
}
page++;
} while (resultSet.length > 0);
This is horrendously slow if the given article is way down the paginated list. How can this be done without the ugly while+for loops?
Edit 2: I can get the result using two sql calls
SELECT 'CurrentArticle' AS type,* FROM
(
SELECT (ROW_NUMBER() OVER (ORDER BY content ASC)) AS RowNum,*
FROM article
WHERE categories LIKE '%local%'
ORDER BY content ASC
) AS tagCloudArticles
WHERE id='9205AE5A'
ORDER BY content ASC
LIMIT 1 OFFSET 0
From that result returned e.g.
('CurrentArticle', 4, '9205AE5A', 'local', 'd')
I can get the RowNum value (4) and then run the sql again to get RowNum+1 (5) and RowNum-1 (3)
SELECT 'PrevNextArticle' AS type,* FROM
(
SELECT (ROW_NUMBER() OVER (ORDER BY content ASC)) AS RowNum,*
FROM article
WHERE categories LIKE '%local%'
ORDER BY content ASC
) AS tagCloudArticles
WHERE RowNum in (3, 5)
ORDER BY content ASC
LIMIT 2 OFFSET 0
with result
('PrevNextArticle', 3, '563313df', 'local', 'c'),
('PrevNextArticle', 5, '5ab669a5', 'local', 'f')
It would be nice to do this in one efficient sql call though.

If the only information about the surrounding articles shown in the page is "Next" and "Previous" there is no need to get their rows in advance. When the user chooses "Previous" or "Next" use these queries SQL Fiddle
-- Previous
select *
from article
where categories = 'local' and id < 3
order by id desc
limit 1
;
-- Next
select *
from article
where categories = 'local' and id > 3
order by id
limit 1
;
If it is necessary to get information about the previous and next articles: SQL Fiddle
with ordered as (
select
id, content,
row_number() over(order by content) as rn
from article
where categories = 'local'
), rn as (
select rn
from ordered
where id = '9205AE5A'
)
select
o.id,
o.content,
o.rn - rn.rn as rn
from ordered o cross join rn
where o.rn between rn.rn -1 and rn.rn + 1
order by o.rn
The articles will have rn -1, 0, 1, if existent.

Check whether following query solve your issue. passed id as well in filter with category:
SELECT * FROM
(
select (1 + row_number() OVER(Order BY id ASC)) AS RowNo,* from article where categories like '%local%' and id>=3
UNION
(SELECT 1,* FROM article where categories like '%local%' and id<3 ORDER BY id DESC LIMIT 1)
) AS TEMP
WHERE
RowNo between 1 and (1+10-1)
ORDER BY
RowNo

I think this query will yield you the result
(SELECT *, 2 AS ordering from article where categories like '%local%' AND id = 3 LIMIT 1)
UNION
(SELECT *, 1 AS ordering from article where categories like '%local%' AND id < 3 ORDER BY id DESC LIMIT 1 )
UNION
(SELECT *, 3 AS ordering from article where categories like '%local%' AND id > 3 ORDER BY id ASC LIMIT 1 )

SQL MAX of column including its primary key

Short:
From below sql select I get the cart_id and the value of the maximum valued item in that cart.
SELECT CartItems.cart_id, MAX(ItemValues.value)
FROM CartItems
INNER JOIN ItemValues
ON CartItems.item_id=ItemValues.item_id
GROUP BY CartItems.cart_id
but I also need item_id for that item (ItemValues.item-id).
Long:
Two tables, CartItems, ItemValues (and their respective Carts, Items, irrelevant here).
Each cart can have several items wheras each item has one value defined in ItemValues.
Each item belongs to one cart.
The value of a cart is the value of the item with maximum value within its cart.
How do I select cart-id, max(item-value) and it's corresponding item-id?
For instance cart-id A contains item-id X with value 10 and item-id Y with value 90.
With above sql select I get, A, 90
What I need is
A, Y, 90
platform: MS SQL

In MS SQL and Oracle:
SELECT *
FROM
(
SELECT ci.*, iv.*,
ROW_NUMBER() OVER (PARTITION BY CartItems.cart_id ORDER BY ItemValues.value DESC)
FROM CartItems ci
INNER JOIN ItemValues iv
ON CartItems.item_id=ItemValues.item_id
) s
WHERE rn = 1
In MySQL:
SELECT
FROM
(
SELECT ci.*,
(
SELECT id
FROM ItemValues iv
WHERE iv.item_id = ci.item_id
ORDER BY
value DESC
LIMIT 1
) AS maxitem
FROM CartItems ci
) iv, ItemValues ivo
WHERE ivo.id = iv.maxitem

This code was written for Oracle, but should be compatible with most SQL versions:
This gets the max(high_val) and returns its key.
select high_val, my_key
from (select high_val, my_key
from mytable
where something = 'avalue'
order by high_val desc)
where rownum <= 1
What this says is: Sort mytable by high_val descending for values where something = 'avalue'. Only grab the top row, which will provide you with the max(high_val) in the selected range and the my_key to that table.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Syntax error unexpected 'with' while using dbt_utils.date_spine - dbt

Related

Hybris Flexible search union query to fetch products

Replacing IN clause by EXISTS causes unexpected results

How to return max(ID) from any of 10 columns based on Import_Timestamp

SQL to Paginate Data Where Pagination Starts at a Given Primary Key

SQL MAX of column including its primary key

Categories

Resources