What's the trick for unnesting an array in Snowflake? - sql

I have some complex array logic I'm using in postgres but now need to transfer to snowflake. The issue I'm having is with the second column syntax of connected_users, specifically the unnest. I know flatten(table(input => is the solution to unnesting in snowflake but I just cant seem to get the syntax right.
What this logic currently does is:
concatenates two different arrays (from c2 users and c1 users)
unnests the concatenated array into a flattened table
re-arrays only distinct values (e) from the flattened table where c0 user is not equal to any of the values
the query:
select
c0.user_id as user
, array(select distinct e from unnest(array_cat(
array_agg(distinct c2.user_id order by c2.user_id desc),
array_agg(distinct c1.user_id order by c1.user_id desc)
)) as a(e) where c0.user_id != e) as connected_users
from device_logs c0
left join device_logs c1 ON (
c0.device_id = c1.device_id
)
left join device_logs c11 ON (
c1.user_id = c11.user_id
)
left join device_logs c2 ON (
c2.device_id = c11.device_id
)
group by 1, 2
I was hoping that I could easily just replace the unnest function with flatten(table(input =>, however that still produces errors

Related

Linked list concept in postgresql

I am new to postgresql, can you please guide me about my query listed below?
I have a table in postgres (database) named "app" having two columns "aid" and "cid".
Table Name: app
aid | cid
a1 | a3
a2 | null
a3 | a5
a4 | a6
a5 | null
a6 | null
What I want to display(using sql query in server), when I select "a1" or "a3" or "a5" from aid using sql query, I want to list all values associated with "a1" and its child cid (in this case I want an output = a1 a3 a5), its like a linked list a1 is connected to a3 and a3 is connected to a5.
If I select "a4" using sql query, I need an output like this("a4 a6")
You need to use recursion to accomplish this:
with recursive walk as (
select aid, cid, array[aid] as path
from app
union all
select w.aid, a.cid, w.path||a.aid
from walk w
join app a
on a.aid = w.cid
)
select *, array_to_string(path, ' ') as text_path
from walk
where cid is null;
Working fiddle here.
If your table is large, then to limit the cost of recursion, use a where clause in the top half of the walk CTE to restrict your starting point.
with recursive walk as (
select aid, cid, array[aid] as path
from app
where aid = 'a1'
union all
. . .
You can get the reverse path without having to recurse again like this:
with recursive walk as (
select aid, cid, array[aid] as path
from app
union all
select w.aid, a.cid, w.path||a.aid
from walk w
join app a
on a.aid = w.cid
), forward as (
select *, array_to_string(path, ' ') as text_path
from walk
where cid is null
), reverse as (
select distinct on (a.aid) a.aid, f.path, f.text_path, r.path as rpath
from app a
join forward f
on f.aid = a.aid
join forward r
on r.path #> array[a.aid]
order by a.aid, array_length(r.path, 1) desc
)
select r.aid, r.path, r.text_path,
array_agg(u.rid order by u.rn desc) as up_path,
string_agg(u.rid, ' ' order by u.rn desc) as text_up_path
from reverse r
join lateral unnest(rpath)
with ordinality as u(rid, rn)
on u.rn <= array_position(r.rpath, r.aid)
group by r.aid, r.path, r.text_path;
Updated fiddle.

How to convert list of comma separated Ids into their name?

I have a table that contains:
id task_ids
1 10,15
2 NULL
3 17
I have the table that has the names of this tasks:
id task_name
10 a
15 b
17 c
I want to generate the following output
id task_ids task_names
1 10,15 a,b
2 null null
3 17 c
I know this structure isn't ideal but this is legacy table which I will not change now.
Is there easy way to get the output ?
I'm using Presto but I think this can be solved with native sql
WITH data AS (
SELECT * FROM (VALUES (1, '10,15'), (2, NULL)) x(id, task_ids)
),
task AS (
SELECT * FROM (VALUES ('10', 'a'), ('15', 'b')) x(id, task_name)
)
SELECT
d.id, d.task_ids
-- array_agg will obviously capture NULL task_name comping from LEFT JOIN, so we need to filter out such results
IF(array_agg(t.task_name) IS NOT DISTINCT FROM ARRAY[NULL], NULL, array_agg(t.task_name)) task_names
FROM data d
-- split task_ids by `,`, convert into numbers, UNNEST into separate rows
LEFT JOIN UNNEST (split(d.task_ids, ',')) AS e(task_id) ON true
-- LEFT JOIN with task to pull the task name
LEFT JOIN task t ON e.task_id = t.id
-- aggregate back
GROUP BY d.id, d.task_ids;
You have a horrible data model, but you can do what you want with a bit of effort. Arrays are better than strings, so I'll just use that:
select t.id, t.task_id, array_agg(tt.task_name) as task_names
from t left join lateral
unnest(split(t.task_ids, ',')) u(task_id)
on 1=1 left join
tasks tt
on tt.task_id = u.task_id
group by t.id, t.task_id;
I don't have Presto on hand to test this. But this or some minor variant should do what you want.
EDIT:
This version might work:
select t.id, t.task_id,
(select array_agg(tt.task_name)
from unnest(split(t.task_ids, ',')) u(task_id) join
tasks tt
on tt.task_id = u.task_id
) as task_names
from t ;

Why does my sub-query work with a string, but not a field reference?

I have (what I think is) a rather complex query. The query gets the record that I want and then all of the data referenced in the first response. It works if my sub-query conditional is a string, but not if it's a field (of the exact same value).
// Query with string as conditional in lowest sub-query (4th line from the bottom)
SELECT
e1.entity as entity
,ARRAY_CAT(
ARRAY_COMPACT(
ARRAY_CONSTRUCT(
any_value(e2.entity),
any_value(u1.user)
)
)
,ARRAY_AGG(e3.entity)
) as includes
FROM ENTITIES e1
LEFT JOIN ENTITIES e2 ON e1.entity:owner:workspace = e2.entity:id
LEFT JOIN USERS u1 ON e1.entity:owner:user = u1.user:id
LEFT JOIN ENTITIES e3 ON e3.entity:id IN (
SELECT ee2.value FROM
table(FLATTEN( input=>
SELECT SPLIT(LISTAGG( CASE WHEN IS_ARRAY(ee1.value:id) THEN ARRAY_TO_STRING(ee1.value:id, ',') ELSE ee1.value:id END, ','), ',')
FROM table(FLATTEN( input => ( SELECT e4.entity:relationships:entities FROM ENTITIES e4 WHERE e4.entity:id = 'bd265f29-ca32-449a-b765-bb488e4d6b3c' ) )) ee1
)) ee2
)
GROUP BY e1.entity
The above produces:
"entity" column:
https://jsonblob.com/6d98b587-8989-11e9-b738-a9487a0dac0b
"includes" column:
https://jsonblob.com/068a8672-8988-11e9-b738-77f0e471310b
However, if I change the uuid string (bd265f29-ca32-449a-b765-bb488e4d6b3c) to e1.entity:id (below) then I get the error SQL compilation error: Unsupported subquery type cannot be evaluated.
SELECT
e1.entity as entity
,ARRAY_CAT(
ARRAY_COMPACT(
ARRAY_CONSTRUCT(
any_value(e2.entity),
any_value(u1.user)
)
)
,ARRAY_AGG(e3.entity)
) as includes
FROM ENTITIES e1
LEFT JOIN ENTITIES e2 ON e1.entity:owner:workspace = e2.entity:id
LEFT JOIN USERS u1 ON e1.entity:owner:user = u1.user:id
LEFT JOIN ENTITIES e3 ON e3.entity:id IN (
SELECT ee2.value FROM
table(FLATTEN( input=>
SELECT SPLIT(LISTAGG( CASE WHEN IS_ARRAY(ee1.value:id) THEN ARRAY_TO_STRING(ee1.value:id, ',') ELSE ee1.value:id END, ','), ',')
FROM table(FLATTEN( input => ( SELECT e4.entity:relationships:entities FROM ENTITIES e4 WHERE e4.entity:id = e1.entity:id ) )) ee1
)) ee2
)
GROUP BY e1.entity
I have no idea why the switch is causing the error. Why does my sub-query work with a string, but not a field reference?
So with a couple of CTE's to provide data, when can do most of the lifting of your correlated sub-queries. I put both forms of arrays of things in entities, and a single entity with multiple id's as is expressed in your FLATTEN usage:
WITH users AS (
SELECT parse_json('{"id":1}') as user
), entities AS (
SELECT parse_json(column1) as entity
FROM VALUES
('{"id":10, "relationships":{"entities":[{"id":11},{"id":12}]}, "owner":{"user":1,"workspace":10}}'),
('{"id":11, "relationships":{"entities":[{"id":11}]}}'),
('{"id":12, "relationships":{"entities":[{"id":[10,11]}]}}')
), ent1 AS (
SELECT e4.entity:id as ent_id
,ee1.index
,SPLIT(LISTAGG( IFF( IS_ARRAY(ee1.value:id), ARRAY_TO_STRING(ee1.value:id, ','), ee1.value:id), ','), ',') as vals
FROM ENTITIES AS e4,
TABLE(FLATTEN( input => e4.entity:relationships:entities )) ee1
GROUP BY 1,2
), ent_rels AS (
SELECT ent_id, ee2.value::number as rel_id
FROM ent1 ee1,
TABLE(FLATTEN( input => ee1.vals)) ee2
)
SELECT
e1.entity:id as entity
,e2.entity:id as e2_entity
,u1.user:id as u1_user
,e3.entity:id as e3_entity
FROM ENTITIES e1
LEFT JOIN ENTITIES e2 ON e1.entity:owner:workspace = e2.entity:id
LEFT JOIN USERS u1 ON e1.entity:owner:user = u1.user:id
LEFT JOIN ent_rels er ON er.ent_id = e1.entity:id
LEFT JOIN ENTITIES e3 ON e3.entity:id = er.rel_id
ORDER BY e1.entity:id;
So this SQL is not the select results you had, but does shown things are JOINING as expected.
ENTITY E2_ENTITY U1_USER E3_ENTITY
10 10 1 11
10 10 1 12
11 null null 11
12 null null 10
12 null null 11
So this final select is the way you had it originally
SELECT
e1.entity as entity
,ARRAY_CAT(
ARRAY_COMPACT(
ARRAY_CONSTRUCT(
any_value(e2.entity),
any_value(u1.user)
)
)
,ARRAY_AGG(e3.entity)
) as includes
FROM ENTITIES e1
LEFT JOIN ENTITIES e2 ON e1.entity:owner:workspace = e2.entity:id
LEFT JOIN USERS u1 ON e1.entity:owner:user = u1.user:id
LEFT JOIN ent_rels er ON er.ent_id = e1.entity:id
LEFT JOIN ENTITIES e3 ON e3.entity:id = er.rel_id
GROUP BY e1.entity
ORDER BY e1.entity:id;
Also given the fact you are undoing two layers of nesting to get matching id's, you can avoid the LISTAGG and SPLITS and just break them up via:
), ent1 AS (
SELECT e4.entity:id as ent_id
,ee1.value:id as vals
FROM ENTITIES AS e4,
TABLE(FLATTEN( input => e4.entity:relationships:entities )) ee1
), ent_rels AS (
SELECT ent_id
,coalesce(ee2.value,ee1.vals) as rel_id
FROM ent1 ee1,
TABLE(FLATTEN( input => ee1.vals, outer => true)) ee2
)
which can be merged/nested if that's your preference:
, ent_rels AS (
SELECT ent_id
,coalesce(ee3.value,ee2.vals) as rel_id
FROM (
SELECT e1.entity:id as ent_id
,ee1.value:id as vals
FROM ENTITIES AS e1,
TABLE(FLATTEN( input => e1.entity:relationships:entities )) ee1
) ee2,
TABLE(FLATTEN( input => ee2.vals, outer => true)) ee3
)
The Snowflake documentation on subqueries includes this restriction:
Correlated scalar subqueries are currently supported only if they can be statically determined to return one row (e.g. if the SELECT list contains an aggregate function with no GROUP BY).
So you might try:
( SELECT MAX(e4.entity:relationships:entities)
FROM ENTITIES e4
WHERE e4.entity:id = e1.entity:id
)
Did you try to cast it like this?
e1.entity:id::string
The Snowflake documentation mentions:
Subqueries with a correlation inside of FLATTEN are currently
unsupported.
Can you not simply use e1.entity:relationships:entities instead of the subquery?

return column name of the maximum value in sql server 2012

My table looks like this (Totally different names)
ID Column1--Column2---Column3--------------Column30
X 0 2 6 0101 31
I want to find the second maximum value of Column1 to Column30 and Put the column_Name in a seperate column.
First row would look like :
ID Column1--Column2---Column3--------------Column30------SecondMax
X 0 2 6 0101 31 Column3
Query :
Update Table
Set SecondMax= (select Column_Name from table where ...)
with unpvt as (
select id, c, m
from T
unpivot (c for m in (c1, c2, c3, ..., c30)) as u /* <-- your list of columns */
)
update T
set SecondMax = (
select top 1 m
from unpvt as u1
where
u1.id = T.id
and u1.c < (
select max(c) from unpvt as u2 where u2.id = u1.id
)
order by c desc, m
)
I really don't like relying on top but this isn't a standard sql question anyway. And it doesn't do anything about ties other than returning the first column name by order of alphabetical sort.
You could use a modification via the condition below to get the "third maximum". (Obviously the constant 2 comes from 3 - 1.) Your version of SQL Server lets you use a variable there as well. I think SQL 2012 also supports the limit syntax if that's preferable to top. And since it should work for top 0 and top 1 as well, you might just be able to run this query in a loop to populate all of your "maximums" from first to thirty.
Once you start having ties you'll eventually get a "thirtieth maximum" that's null. Make sure you cover those cases though.
and u1.c < all (
select top 2 distinct c from unpvt as u2 where u2.id = u1.id
)
And after I think about it. If you're going to rank and update so many columns it would probably make even more sense to use a proper ranking function and do the update all at once. You'll also handle the ties a lot better even if the alphabetic sorting is still arbitrary.
with unpvt as (
select id, c, m, row_number() over (partition by id order by c desc, m) as nthmax
from T
unpivot (c for m in (c1, c2, c3, ..., c30)) as u /* <-- your list of columns */
)
update T set
FirstMax = (select c from unpvt as u where u.id = T.id and nth_max = 1),
SecondMax = (select c from unpvt as u where u.id = T.id and nth_max = 2),
...
NthMax = (select c from unpvt as u where u.id = T.id and nth_max = N)

Limit join to one row

I have the following query:
SELECT sum((select count(*) as itemCount) * "SalesOrderItems"."price") as amount, 'rma' as
"creditType", "Clients"."company" as "client", "Clients".id as "ClientId", "Rmas".*
FROM "Rmas" JOIN "EsnsRmas" on("EsnsRmas"."RmaId" = "Rmas"."id")
JOIN "Esns" on ("Esns".id = "EsnsRmas"."EsnId")
JOIN "EsnsSalesOrderItems" on("EsnsSalesOrderItems"."EsnId" = "Esns"."id" )
JOIN "SalesOrderItems" on("SalesOrderItems"."id" = "EsnsSalesOrderItems"."SalesOrderItemId")
JOIN "Clients" on("Clients"."id" = "Rmas"."ClientId" )
WHERE "Rmas"."credited"=false AND "Rmas"."verifyStatus" IS NOT null
GROUP BY "Clients".id, "Rmas".id;
The problem is that the table "EsnsSalesOrderItems" can have the same EsnId in different entries. I want to restrict the query to only pull the last entry in "EsnsSalesOrderItems" that has the same "EsnId".
By "last" entry I mean the following:
The one that appears last in the table "EsnsSalesOrderItems". So for example if "EsnsSalesOrderItems" has two entries with "EsnId" = 6 and "createdAt" = '2012-06-19' and '2012-07-19' respectively it should only give me the entry from '2012-07-19'.
SELECT (count(*) * sum(s."price")) AS amount
, 'rma' AS "creditType"
, c."company" AS "client"
, c.id AS "ClientId"
, r.*
FROM "Rmas" r
JOIN "EsnsRmas" er ON er."RmaId" = r."id"
JOIN "Esns" e ON e.id = er."EsnId"
JOIN (
SELECT DISTINCT ON ("EsnId") *
FROM "EsnsSalesOrderItems"
ORDER BY "EsnId", "createdAt" DESC
) es ON es."EsnId" = e."id"
JOIN "SalesOrderItems" s ON s."id" = es."SalesOrderItemId"
JOIN "Clients" c ON c."id" = r."ClientId"
WHERE r."credited" = FALSE
AND r."verifyStatus" IS NOT NULL
GROUP BY c.id, r.id;
Your query in the question has an illegal aggregate over another aggregate:
sum((select count(*) as itemCount) * "SalesOrderItems"."price") as amount
Simplified and converted to legal syntax:
(count(*) * sum(s."price")) AS amount
But do you really want to multiply with the count per group?
I retrieve the the single row per group in "EsnsSalesOrderItems" with DISTINCT ON. Detailed explanation:
Select first row in each GROUP BY group?
I also added table aliases and formatting to make the query easier to parse for human eyes. If you could avoid camel case you could get rid of all the double quotes clouding the view.
Something like:
join (
select "EsnId",
row_number() over (partition by "EsnId" order by "createdAt" desc) as rn
from "EsnsSalesOrderItems"
) t ON t."EsnId" = "Esns"."id" and rn = 1
this will select the latest "EsnId" from "EsnsSalesOrderItems" based on the column creation_date. As you didn't post the structure of your tables, I had to "invent" a column name. You can use any column that allows you to define an order on the rows that suits you.
But remember the concept of the "last row" is only valid if you specifiy an order or the rows. A table as such is not ordered, nor is the result of a query unless you specify an order by
Necromancing because the answers are outdated.
Take advantage of the LATERAL keyword introduced in PG 9.3
left | right | inner JOIN LATERAL
I'll explain with an example:
Assuming you have a table "Contacts".
Now contacts have organisational units.
They can have one OU at a point in time, but N OUs at N points in time.
Now, if you have to query contacts and OU in a time period (not a reporting date, but a date range), you could N-fold increase the record count if you just did a left join.
So, to display the OU, you need to just join the first OU for each contact (where what shall be first is an arbitrary criterion - when taking the last value, for example, that is just another way of saying the first value when sorted by descending date order).
In SQL-server, you would use cross-apply (or rather OUTER APPLY since we need a left join), which will invoke a table-valued function on each row it has to join.
SELECT * FROM T_Contacts
--LEFT JOIN T_MAP_Contacts_Ref_OrganisationalUnit ON MAP_CTCOU_CT_UID = T_Contacts.CT_UID AND MAP_CTCOU_SoftDeleteStatus = 1
--WHERE T_MAP_Contacts_Ref_OrganisationalUnit.MAP_CTCOU_UID IS NULL -- 989
-- CROSS APPLY -- = INNER JOIN
OUTER APPLY -- = LEFT JOIN
(
SELECT TOP 1
--MAP_CTCOU_UID
MAP_CTCOU_CT_UID
,MAP_CTCOU_COU_UID
,MAP_CTCOU_DateFrom
,MAP_CTCOU_DateTo
FROM T_MAP_Contacts_Ref_OrganisationalUnit
WHERE MAP_CTCOU_SoftDeleteStatus = 1
AND MAP_CTCOU_CT_UID = T_Contacts.CT_UID
/*
AND
(
(#in_DateFrom <= T_MAP_Contacts_Ref_OrganisationalUnit.MAP_KTKOE_DateTo)
AND
(#in_DateTo >= T_MAP_Contacts_Ref_OrganisationalUnit.MAP_KTKOE_DateFrom)
)
*/
ORDER BY MAP_CTCOU_DateFrom
) AS FirstOE
In PostgreSQL, starting from version 9.3, you can do that, too - just use the LATERAL keyword to achieve the same:
SELECT * FROM T_Contacts
--LEFT JOIN T_MAP_Contacts_Ref_OrganisationalUnit ON MAP_CTCOU_CT_UID = T_Contacts.CT_UID AND MAP_CTCOU_SoftDeleteStatus = 1
--WHERE T_MAP_Contacts_Ref_OrganisationalUnit.MAP_CTCOU_UID IS NULL -- 989
LEFT JOIN LATERAL
(
SELECT
--MAP_CTCOU_UID
MAP_CTCOU_CT_UID
,MAP_CTCOU_COU_UID
,MAP_CTCOU_DateFrom
,MAP_CTCOU_DateTo
FROM T_MAP_Contacts_Ref_OrganisationalUnit
WHERE MAP_CTCOU_SoftDeleteStatus = 1
AND MAP_CTCOU_CT_UID = T_Contacts.CT_UID
/*
AND
(
(__in_DateFrom <= T_MAP_Contacts_Ref_OrganisationalUnit.MAP_KTKOE_DateTo)
AND
(__in_DateTo >= T_MAP_Contacts_Ref_OrganisationalUnit.MAP_KTKOE_DateFrom)
)
*/
ORDER BY MAP_CTCOU_DateFrom
LIMIT 1
) AS FirstOE
Try using a subquery in your ON clause. An abstract example:
SELECT
*
FROM table1
JOIN table2 ON table2.id = (
SELECT id FROM table2 WHERE table2.table1_id = table1.id LIMIT 1
)
WHERE
...