MATCH_RECOGNIZE with CTE in Snowflake - sql

I am using MATCH_RECOGNIZE function in a query with a few CTEs. When I run the query, I got the following error:
SQL compilation error: MATCH_RECOGNIZE not supported in this context.
In my query, there are several CTEs before and after the MATCH_RECOGNIZE partially as below.
WITH cte1 AS (
SELECT *
FROM dataset
WHERE ID IS NOT NULL AND STATUS IS NOT NULL ),
cte2 AS (
SELECT *
FROM cte1
QUALIFY FIRST_VALUE(STATUS) OVER (PARTITION BY ID ORDER BY CREATED_AT) = 'created' )
mr as (
SELECT *
FROM cte2
MATCH_RECOGNIZE (
PARTITION BY ID
ORDER BY CREATED_AT
MEASURES MATCH_NUMBER() AS mn,
MATCH_SEQUENCE_NUMBER AS msn
ALL ROWS PER MATCH
PATTERN (c+m+)
DEFINE
c AS status='created'
,m AS status='missing_info'
,p AS status='pending'
) m1
QUALIFY (ROW_NUMBER() OVER(PARTITION BY mn, ID ORDER BY msn) = 1)
OR(ROW_NUMBER() OVER(PARTITION BY mn, ID ORDER BY msn DESC)=1)
ORDER BY ID, CREATED_AT ),
cte3 as (
SELECT *
FROM mr
-- some other operations
)
What would be the ideal approach to solve this? e.g. creating a regular view, a materialized view, or a temp table, etc. I tried to create a view but got an error, not sure if it is supported either.
How can I use the result of the MATCH_RECOGNIZE in other later CTEs?
When I add the following, it gives this error:
syntax error line xx at position 0 unexpected 'create'.
create view filtered_idents AS
SELECT *
FROM cte2
MATCH_RECOGNIZE (
)

This seems to be a non-documented limitation (I asked our awesome docs team to fix this).
In the meantime I could suggest to divide the process into steps to use the match_recognize results.
Reproducing error:
with data as (
select $1 company, $2 price_date, $3 price
from values('a',1,10), ('a',2,15)
), cte as (
select *
from data match_recognize(
partition by company
order by price_date
measures match_number() as "MATCH_NUMBER"
all rows per match omit empty matches
pattern(overavg*)
define
overavg as price > avg(price) over (rows between unbounded
preceding and unbounded following)
)
)
select * from cte
-- 002362 (0A000): SQL compilation error: MATCH_RECOGNIZE not supported in this context.
2 step solution:
with data as (
select $1 company, $2 price_date, $3 price
from values('a',1,10), ('a',2,15)
)
select *
from data match_recognize(
partition by company
order by price_date
measures match_number() as "MATCH_NUMBER"
all rows per match omit empty matches
pattern(overavg*)
define
overavg as price > avg(price) over (rows between unbounded
preceding and unbounded following)
)
;
with previous_results as (
select *
from table(result_scan(last_query_id()))
)
select *
from previous_results
;

Kimi, trying out your snippet I'm getting:
SQL compilation error: syntax error line 11 at position 0 unexpected 'mr'. syntax error line 17 at position 6 unexpected 'MEASURES'.
Line 9 seems to be missing a terminating comma.
When I add one and then complete the whole with a simple select statement then I don't get syntax errors anymore, I only get name lookup errors (expected of course).

Related

oracle db from keyword not found where expected in double cte

I have a double cte expression , the first one join two tables and the second is implementing a partition by function:
with cte as (
select *
from memuat.product p
join memuat.licence l on p.id = l.product_id
where l.managed = 'TRUE'
),
joined as (
select
*,
row_number() over (partition by id order by id) as rn
from cte
)
select * from joined;
I get the following error:
ORA-00923: FROM keyword not found where expected, ERROR at line 12.
I cannot figure out which syntax error is wrong in my query.
Oracle is nitpicking when it comes to SELECT *. SELECT * means "select everything", so how can you possibly add something to it? In Oracle you cannot SELECT *, 1 AS something_else FROM some_table. You must have SELECT some_table.*, 1 AS something_else FROM some_table, so you are no longer selecting "everything", but "everything from the table" :-)
You have
select
*,
row_number() over (partition by id order by id) as rn
from cte
It must be
select
cte.*,
row_number() over (partition by id order by id) as rn
from cte
instead.

My sql statement does not like my partion over statement

I have this simple query in DB2. I'm getting an error at the 2nd line from the top. For someone reason, it does not want to run my over(partition) line?
with core as(
select *,row_number() over(partition by asgnd_to_pin order by stdt asc) as rank
from mhal_rep.stushh
where stus_cd in ('DWRT', 'FINL', 'DWFL', 'DWR', 'DWSR', 'DWPC')
AND STDT BETWEEN '2009-02-28' AND '2019-02-28'
UNION
select *,row_number() over(partition by asgnd_to_pin order by stdt asc) as rank
from mhal_rep.stusha
where stus_cd in ('DWRT', 'FINL', 'DWFL', 'DWR', 'DWSR', 'DWPC')
AND STDT BETWEEN '2009-02-28' AND '2019-02-28'
),
core1 as(
select asgnd_to_pin, stus_cd, stdt, rank, (('2019-02-28'-stdt)/365) as
lngth_srvc
from core
where rank=1 and
asgnd_to_pin in (
'788387',
'271562',
'155851')
select *
from core 1;
The error I'm getting says:
ERROR [42601] [IBM][DB2] SQL0104N An unexpected token "," was found
following "". Expected tokens may include: "FROM INTO".
Use
select t.*, row_number() over(partition by asgnd_to_pin order by stdt asc) as rank
from mhal_rep.stushX t
...
instead of
select *, row_number() over(partition by asgnd_to_pin order by stdt asc) as rank
from mhal_rep.stushX
...
But actually you have more problems:
- core1 subselect is not closed by )
- space between core and 1 in the outer select statement
just remove space between core and 1
select *
from core1;

BigQuery Standard SQL: Delete Duplicates from Table

I am using below query to delete duplicates records from bigquery using standard sql. but it is throwing error
with cte as (
select * ,row_number()over (partition by CallRailCallId order by CallRailCallId) as rn
from `encoremarketingtest.EncoreMarketingTest.CallRailCall2` )
delete
from cte
where rn>1
Query Failed
Error: Syntax error: Expected "(" or keyword SELECT but got keyword DELETE at [5:5]
Could anyone help me on the correct approach in BigQuery?
Option #1
CREATE OR REPLACE TABLE `project.dataset.your_table` AS
SELECT * EXCEPT(rn)
FROM (
SELECT *, ROW_NUMBER() OVER(PARTITION BY CallRailCallId ORDER BY CallRailCallId) rn
FROM `project.dataset.your_table`
)
WHERE rn = 1
Option #2
CREATE OR REPLACE TABLE `project.dataset.your_table` AS
SELECT row.*
FROM (
SELECT ARRAY_AGG(t ORDER BY CallRailCallId LIMIT 1)[OFFSET(0)] row
FROM `project.dataset.your_table` t
GROUP BY CallRailCallId
)
As you might noticed, above options using DDL(CREATE TABLE) approach and that is where it is possible to use just one known (from your question) column - CallRailCallId
Also, note - ORDER BY CallRailCallId plays no real role there because GROUP BY and PARTITION BY are by exactly same filed. But if you change the field this will control which exactly row (out of few duplicates) to "survive" (For example ORDER BY ts DESC - see below option for what ts might be)
Option #3
This option uses DML(DELETE FROM) but requires some extra column to be used to serve as a tie-breaker
For example you have ts TIMESTAMP field and you want the most recent (based on ts) row to survive
DELETE FROM `project.dataset.your_table`
WHERE STRUCT(CallRailCallId, ts) NOT IN (
SELECT AS STRUCT CallRailCallId, MAX(ts) ts
FROM `project.dataset.your_table`
GROUP BY CallRailCallId
)

Query Hive table using ROWNUM

How can I query a Hive table specific to row number.
For example :
Let say I want to print out all records of Hive table from row number 2 to 5.
I actually recently updated the documentation regarding the offset option
... order by ... limit 1,4
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Select#LanguageManualSelect-LIMITClause
This answer seems like what you're asking:
SQL most recent using row_number() over partition
In other words:
SELECT user_id, page_name, recent_click
FROM (
SELECT user_id,
page_name,
row_number() over (partition by session_id order by ts desc) as recent_click
from clicks_data
) T
WHERE recent_click between 2 and 5

ORDER BY upper(...) with a UNION giving me problems

I'm having a bit of trouble figuring out why I'm having this problem.
This code works exactly how it should. It combines the two tables (MESSAGES and MESSAGES_ARCHIVE) and orders them correctly.
SELECT * FROM (
SELECT rownum as rn, a.* FROM (
SELECT
outbound.FROM_ADDR, outbound.TO_ADDR, outbound.EMAIL_SUBJECT
from MESSAGES outbound
where (1 = 1)
UNION ALL
SELECT
outboundarch.FROM_ADDR, outboundarch.TO_ADDR, outboundarch.EMAIL_SUBJECT
from MESSAGES_ARCHIVE outboundarch
where (1 = 1)
order by FROM_ADDR DESC
) a
) where rn between 1 and 25
However, this code does not work.
SELECT * FROM (
SELECT rownum as rn, a.* FROM (
SELECT
outbound.FROM_ADDR, outbound.TO_ADDR, outbound.EMAIL_SUBJECT
from MESSAGES outbound
where (1 = 1)
UNION ALL
SELECT
outboundarch.FROM_ADDR, outboundarch.TO_ADDR, outboundarch.EMAIL_SUBJECT
from MESSAGES_ARCHIVE outboundarch
where (1 = 1)
order by upper(FROM_ADDR) DESC
) a
) where rn between 1 and 25
and returns this error
ORA-01785: ORDER BY item must be the number of a SELECT-list expression
01785. 00000 - "ORDER BY item must be the number of a SELECT-list expression"
I'm trying to get the two tables ordered regardless of letter case, which is why I'm using upper(FROM_ADDR). Any suggestions? Thanks!
I'm not quite sure why this is generating an error, but it probably has to do with scoping rules for union queries. There is an easy work-around, using row_number():
SELECT * FROM (
SELECT row_number() over (order by upper(FROM_ADDR)) as rn, a.*
FROM (
SELECT
outbound.FROM_ADDR, outbound.TO_ADDR, outbound.EMAIL_SUBJECT
from MESSAGES outbound
where (1 = 1)
UNION ALL
SELECT
outboundarch.FROM_ADDR, outboundarch.TO_ADDR, outboundarch.EMAIL_SUBJECT
from MESSAGES_ARCHIVE outboundarch
where (1 = 1)
) a
)
where rn between 1 and 25
Your upper() is returning a value, but not a column name.
Instead of:
order by upper(FROM_ADDR) DESC
try:
order by upper(FROM_ADDR) as FROM_ADDR DESC