Using Concatenate column in aggregate function for PIVOT in Snowflake - sql

SELECT *
from (SELECT Ant, Bird, Cat, Dog, Egg, Fish, Gold, Hen, RANK() OVER
(PARTITION BY (Ant|| Bird|| Cat|| Dog|| Egg||Fish) ORDER BY Dog) AS ROW_COUNT
FROM TABLE1 WHERE Gold = '01')
pivot( MAX(Egg||Fish||Hen) for ROW_COUNT IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
)
as QRY
;
Basically I would like to make this work in Snowflake, but it did not because Snowflake does not allow me to concatenate columns inside aggregate function. However, I notice that this code could run fine in Oracle DB. Can anyone hele me with this? I tried to create concatenate column before put it in MAX(), but that returned different result from the top one (tested in Oracle DB). For instance,
SELECT *
from (SELECT Ant, Bird, Cat, Dog, Egg, Fish, Gold, Hen, Egg||Fish||Hen AS concat_col ,RANK() OVER
(PARTITION BY (Ant|| Bird|| Cat|| Dog|| Egg||Fish) ORDER BY Dog) AS ROW_COUNT
FROM TABLE1 WHERE Gold = '01')
pivot( MAX(concat_col) for ROW_COUNT IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
)
as QRY
;
Above show different results that I expected.

Due to personal reason I could not post the real-code. Anyway, I have found the solution, which simply delete the concatenate columns from select. This prevented the pivot to transpose appropriately.
SELECT *
from (SELECT Ant, Bird, Cat, Dog, Gold, Egg||Fish||Hen AS concat_col ,RANK() OVER
(PARTITION BY (Ant|| Bird|| Cat|| Dog|| Egg||Fish) ORDER BY Dog) AS ROW_COUNT
FROM TABLE1 WHERE Gold = '01')
pivot( MAX(concat_col) for ROW_COUNT IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
)
as QRY
;

Related

How to move this group/concat logic into a function?

Given a column of integers
ids AS (
SELECT
id
FROM
UNNEST([1, 2, 3, 4, 5, 6, 7]) AS id)
I'd like to convert them into the following (batched) string representations:
"1,2,3,4,5"
"6,7"
Currently, I do this as follows:
SELECT
STRING_AGG(CAST(id AS STRING), ',')
FROM (
SELECT
DIV(ROW_NUMBER() OVER() - 1, 5) batch,
id
FROM
ids)
GROUP BY
batch
Since I use this on multiple occasions, I'd like to move this into a function.
Is this possible, and if so how?
(I guess, since we can't pass the table (ids), we'd need to pass an ARRAY<INT64>, but that would be ok.)
I think you might consider below 2 approches.
UDF
returns result as ARRAY<STRING>.
CREATE TEMP FUNCTION batched_string(ids ARRAY<INT64>) AS (
ARRAY(
SELECT STRING_AGG('' || id) FROM (
SELECT DIV(offset, 5) batch, id
FROM UNNEST(ids) id WITH offset
) GROUP BY batch
)
);
SELECT * FROM UNNEST(batched_string([1, 2, 3, 4, 5, 6, 7]));
Table functions
return result as a Table.
note that a table function shouldn't be a temp function.
CREATE OR REPLACE TABLE FUNCTION `your-project.dataset.batched_string`(ids ARRAY<INT64>) AS (
SELECT STRING_AGG('' || id) batched FROM (
SELECT DIV(offset, 5) batch, id
FROM UNNEST(ids) id WITH offset
) GROUP BY batch
);
SELECT * FROM `your-project.dataset.batched_string`([1, 2, 3, 4, 5, 6, 7]);

BigQuery arrays - SELECT DISTINCT ordering guarantees?

I want to filter out the duplicates from a BigQuery array. I also need the order of the elements to be preserved. The docs mention that this can be done by combining SELECT DISTINCT with UNNEST. However, it doesn't mention any ordering behavior. I ran this query and got the desired ordering of [5, 3, 1, 4, 10, 8].
WITH an_array AS (
SELECT [5, 5, 3, 1, 4, 4, 10, 8, 5, 1] AS nums
)
SELECT
ARRAY((
SELECT DISTINCT num
FROM UNNEST(nums) num
))
FROM an_array;
I don't know if that's coincidence or if that ordering is guaranteed. I also tried adding WITH OFFSET with an ORDER BY to specify the order explicitly, but in that case I get Query error: ORDER BY clause expression references table alias offset which is not visible after SELECT DISTINCT.
You should always be explicit about ordering if you care about it:WITH an_array AS (
WITH an_array as (
SELECT [5, 5, 3, 1, 4, 4, 10, 8, 5, 1] AS nums
)
SELECT ARRAY((SELECT num
FROM UNNEST(nums) num WITH OFFSET o
GROUP BY num
ORDER BY MIN(o)
)
)
FROM an_array;

How to use percentile_disc on array

I am able to use approx_quantiles on an array by doing
(select approx_quantiles(reps, 10)[offset(5)] from unnest(arr_tab.arr) as reps) as med,
where arr_tab.arr is an array of values.
I would like to get exact numbers the same way with percentile_disc (the arrays are relatively small), but the following:
(select percentile_disc(reps, .5) from unnest(arr_tab.arr) as reps) as med,
gives the error
Analytic function PERCENTILE_DISC cannot be called without an OVER clause at [17:11] Learn More about BigQuery SQL Functions.
Here is a full example query, which runs if I comment out the percentile_disc attempt:
with arr_tab as (
SELECT [1, 2, 3] AS arr, 'a' as label UNION ALL
SELECT [4, 5, 6], 'c' UNION ALL
SELECT [10, 11, 12], 'd'
)
, q2 as (
select
label,
(select approx_quantiles(reps, 10)[offset(5)] from unnest(arr_tab.arr) as reps) as med,
-- (select percentile_disc(reps, .5) from unnest(arr_tab.arr) as reps) as med2,
from arr_tab
)
select *
from q2
You can use below
(SELECT PERCENTILE_DISC(reps, .5) OVER() FROM UNNEST(arr_tab.arr) AS reps LIMIT 1) AS med2

Oracle SQL: sum( case when then quantity else 0 END) OVER (partition by...) = can't get right GROUP BY statement

I'm trying to select several different sums, one of them being OVER (Partition by column_also_in_select_plan).
However I cannot seem to ever be able to get the GROUP BY statement right.
Example:
Select 1, 2, 3, sum(4) over (partition by 3), sum(case when 6 = etc...)
FROM table
Where filters
GROUP BY ?
Thanks for any tips :)
It doesn't really make much sense to be doing aggregation and using window functions at the same time, so I'm not surprised you're confused. In the above example, you probably want to move the windowing to an outer query, that is:
select 1, 2, 3, sum(sum4) over(partition by 3), ...
from (
select 1, 2, 3, sum(4) as sum4
from table
where filters
group by 1, 2, 3
) x

How to do equivalent of "limit distinct"?

How can I limit a result set to n distinct values of a given column(s), where the actual number of rows may be higher?
Input table:
client_id, employer_id, other_value
1, 2, abc
1, 3, defg
2, 3, dkfjh
3, 1, ldkfjkj
4, 4, dlkfjk
4, 5, 342
4, 6, dkj
5, 1, dlkfj
6, 1, 34kjf
7, 7, 34kjf
8, 6, lkjkj
8, 7, 23kj
desired output, where limit distinct=5 distinct values of client_id:
1, 2, abc
1, 3, defg
2, 3, dkfjh
3, 1, ldkfjkj
4, 4, dlkfjk
4, 5, 342
4, 6, dkj
5, 1, dlkfj
Platform this is intended for is MySQL.
You can use a subselect
select * from table where client_id in
(select distinct client_id from table order by client_id limit 5)
This is for SQL Server. I can't remember, MySQL may use a LIMIT keyword instead of TOP. That may make the query more efficient if you can get rid of the inner most subquery by using the LIMIT and DISTINCT in the same subquery. (It looks like Vinko used this method and that LIMIT is correct. I'll leave this here for the second possible answer though.)
SELECT
client_id,
employer_id,
other_value
FROM
MyTable
WHERE
client_id IN
(
SELECT TOP 5
client_id
FROM
(
SELECT DISTINCT
client_id
FROM
MyTable
) SQ
ORDER BY
client_id
)
Of course, add in your own WHERE clause and ORDER BY clause in the subquery.
Another possibility (compare performance and see which works out better) is:
SELECT
client_id,
employer_id,
other_value
FROM
MyTable T1
WHERE
T1.code IN
(
SELECT
T2.code
FROM
MyTable T2
WHERE
(SELECT COUNT(*) FROM MyTable T3 WHERE T3,code < T2.code) < 5
)
-- Using Common Table Expression in Microsoft SQL Server.
-- LIMIT function does not exist in MS SQL.
WITH CTE
AS
(SELECT DISTINCT([COLUMN_NAME])
FROM [TABLE_NAME])
SELECT TOP (5) [[COLUMN_NAME]]
FROM CTE;
This works for ‍‍MS SQL if anyone is on that platform:
SET ROWCOUNT 10;
SELECT DISTINCT
column1, column2, column3,...
FROM
Table1
WHERE ...