Given two arrays in Snowflake, find summation of minimum elements - sql

Given two arrays like a = [10, 20, 30], and b = [9, 21, 32], how can I construct an array that consists of the minimum or maximum element based on index in snowflake, i.e. the desired output for minimum is [9,20,30] and for the maximum is [10,21,32]?
I looked at snowflake's array functions and didn't find a function that does this.

If the arrays are always the same size (and reusing Lukasz great data cte):
WITH cte AS (
SELECT ARRAY_CONSTRUCT(10, 20, 30) AS a, ARRAY_CONSTRUCT(9, 21, 32) AS b
)
SELECT a,b
,ARRAY_AGG(LEAST(a[n.index], b[n.index])) WITHIN GROUP(ORDER BY n.index) AS min_array
,ARRAY_AGG(GREATEST(a[n.index], b[n.index])) WITHIN GROUP(ORDER BY n.index) AS max_array
FROM cte
,table(flatten(a)) n
GROUP BY 1,2;
gives:
A
B
MIN_ARRAY
MAX_ARRAY
[ 10, 20, 30 ]
[ 9, 21, 32 ]
[ 9, 20, 30 ]
[ 10, 21, 32 ]
And if you have uneven lists:
WITH cte AS (
SELECT ARRAY_CONSTRUCT(10, 20, 30) AS a, ARRAY_CONSTRUCT(9, 21, 32) AS b
union all
SELECT ARRAY_CONSTRUCT(10, 20, 30) AS a, ARRAY_CONSTRUCT(9, 21, 32, 45) AS b
)
SELECT a,b
,ARRAY_AGG(LEAST(a[n.index], b[n.index])) WITHIN GROUP(ORDER BY n.index) AS min_array
,ARRAY_AGG(GREATEST(a[n.index], b[n.index])) WITHIN GROUP(ORDER BY n.index) AS max_array
FROM cte
,table(flatten(iff(array_size(a)>=array_size(b), a, b))) n
GROUP BY 1,2;
A
B
MIN_ARRAY
MAX_ARRAY
[ 10, 20, 30 ]
[ 9, 21, 32 ]
[ 9, 20, 30 ]
[ 10, 21, 32 ]
[ 10, 20, 30 ]
[ 9, 21, 32, 45 ]
[ 9, 20, 30 ]
[ 10, 21, 32 ]
will pick the largest, but given the NULL from the smaller list will cause LEAST/GREATEST to return NULL and ARRAY_AGG drops nulls, you don't even need to size compare, unless you want to NVL/COALESCE that values to safe values for nulls.
SELECT 1 as a, null as b, least(a,b);
gives:
A
B
LEAST(A,B)
1
null
null
like so:
SELECT a,b
,ARRAY_AGG(LEAST(nvl(a[n.index],10000), nvl(b[n.index],10000))) WITHIN GROUP(ORDER BY n.index) AS min_array
,ARRAY_AGG(GREATEST(nvl(a[n.index],0), nvl(b[n.index],0))) WITHIN GROUP(ORDER BY n.index) AS max_array
FROM cte
,table(flatten(iff(array_size(a)>=array_size(b), a, b))) n
GROUP BY 1,2;
A
B
MIN_ARRAY
MAX_ARRAY
[ 10, 20, 30 ]
[ 9, 21, 32 ]
[ 9, 20, 30 ]
[ 10, 21, 32 ]
[ 10, 20, 30 ]
[ 9, 21, 32, 45 ]
[ 9, 20, 30, 45 ]
[ 10, 21, 32, 45 ]

Using numbers table/[] to access elements and ARRAY_AGG to build new arrays:
WITH cte AS (
SELECT ARRAY_CONSTRUCT(10, 20, 30) AS a, ARRAY_CONSTRUCT(9, 21, 32) AS b
), numbers AS (
SELECT ROW_NUMBER() OVER(ORDER BY seq4())-1 AS IND
FROM TABLE(GENERATOR(ROWCOUNT => 10001))
)
SELECT a,b
,ARRAY_AGG(LEAST(a[ind], b[ind])) WITHIN GROUP(ORDER BY n.ind) AS min_array
,ARRAY_AGG(GREATEST(a[ind], b[ind])) WITHIN GROUP(ORDER BY n.ind) AS max_array
FROM cte
JOIN numbers n
ON n.ind < GREATEST(ARRAY_SIZE(a), ARRAY_SIZE(b))
GROUP BY a,b;
Output:

Related

how to do list_agg with a character limit of 1440 characters in Snowflake

I have a table as below, I have 1775 ids and length of id column is 10 characters, I want to create multiple groups of list_agg of ids with a limit of not more than 1440 characters to distribute 1775 ids into groups
id
distributor_name
1234567890
Sample_name1
2345678901
Sample_name1
3456789012
Sample_name1
4567890123
Sample_name2
5678901234
Sample_name2
6789012345
Sample_name3
7890123456
Sample_name3
8901234567
Sample_name3
Required output is:
group
id_count
list_agg
1
120
1234567890,2345678901,3456789012...
2
122
7890123456,5678901234,8901234567...
Very much appreciate your help!
If your ID space is symmetrically distributed you can use WIDTH_BUCKET
with data1 as (
select
row_number() over (order by true)-1 as rn
from table(generator(rowcount=>100))
)
select
width_bucket(rn, 0, 100, 5) as group_id,
count(*) as c,
array_agg(rn) within group(order by rn) as bucket_values
from data1
group by 1
order by 1;
GROUP_ID
C
BUCKET_VALUES
1
20
[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 ]
2
20
[ 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39 ]
3
20
[ 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59 ]
4
20
[ 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79 ]
5
20
[ 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99 ]
If your data is not symmetrical, you can allocate row numbers to each row, and then shave the yack again.
you can also be data driven:
width_bucket(rn, (select min(rn) from data1), (select max(rn) from data1)+1, 5) as group_id,
It's easier done with array_agg but if you must use listagg, here is a spin on Simeon's answer. The basic idea is to keep track of length of numbers when stitched together and also account for the number of commas so we don't go over 1440 char limit.
create or replace temporary table t as
select row_number() over (order by true)-1 as id,
uniform(1000000000, 1999999999, random()) as num
from table(generator(rowcount=>1775));
with cte as
(select *, ceil((sum(len(num)) over (order by id) + count(num) over (order by id) -1)/1440) as group_id
from t)
select group_id,
count(num) as id_count,
listagg(num,',') as id_list,
len(id_list) as len_check
from cte
group by group_id
order by group_id;

How to compare a 2D array against a 1D array column-wise?

I have two numpy arrays. One of them is 2D while the other is 1D.
>>> a = np.arange(0,20).reshape(2,10)
>>> a
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14, 15, 16, 17, 18, 19]])
>>> b = np.full( a.shape[1], 10 )
>>> b
array([10, 10, 10, 10, 10, 10, 10, 10, 10, 10])
I want to compare them column-wise:
If the columns elements in a is identical to the column element of b, then store row number(s) of a.
Else, find the closest matching of a to b and store the row number(s).
In my example, the output from the comparison should be:
[ 1, [0,1], [0,1], [0,1], [0,1], [0,1], [0,1], [0,1], [0,1], [0,1] ]
How do I do this in NumPy?
I was thinking of using np.where( a==b, run a function to get row(s) if same, run another function to get row(s) of diff )? Is this the way?

Parsing string in a hive table

I have a hive table which has two columns (day, type_of_day) both of type string
"monday" [{"temp" : 45, "weather": "rainny"}, {"temp" : 25, "weather": "sunny"}, {"temp" : 15, "weather": "storm"}]
"tuesday" [{"temp" : 5, "weather": "winter"}, {"temp" : 10, "weather": "sun"}, {"temp" : 18, "weather": "dawn"}]
I wanna split ( I guess explode is the technical term) and then just get a list of weather for each day. I'm familiar with how to do this in python but is there a way to directly do this in hive.
"monday" [45, 25, 15]
"tuesday" [5, 10, 18]
Testing with your data example. Replace CTE with your table. Read comments in the code:
with your_table as (--use your table instead of this CTE
select stack(2,
"monday",'[{"temp" : 45, "weather": "rainny"}, {"temp" : 25, "weather": "sunny"}, {"temp" : 15, "weather": "storm"}]',
"tuesday" ,'[{"temp" : 5, "weather": "winter"}, {"temp" : 10, "weather": "sun"}, {"temp" : 18, "weather": "dawn"}]'
)as (day, type_of_day)
) --use your table instead of this CTE
select s.day, array(get_json_object(type_of_day_array[0],'$.temp'),
get_json_object(type_of_day_array[1],'$.temp'),
get_json_object(type_of_day_array[2],'$.temp')
) as result_array --extract JSON elements and construct array
from
(
select day, split(regexp_replace(regexp_replace(type_of_day,'\\[|\\]',''), --remove square brackets
'\\}, *\\{','\\}##\\{'), --make convenient split separator
'##') --split
as type_of_day_array
from your_table --use your table instead of this CTE
)s;
Result:
s.day result_array
monday ["45","25","15"]
tuesday ["5","10","18"]
If the array of JSON can contain more than three elements, then you can use lateral view explode or posexplode and then build the resulting array like in this answer: https://stackoverflow.com/a/51570035/2700344.
Wrap array elements in cast(... as int) if you need array<int> as a result instead of array<string>:
cast(get_json_object(type_of_day[0],'$.temp') as int)...

count null values and group results

Problem
i want to count null values and group the results but it gives me wrong values
String jpql = "select c.commande.user.login, (select count(*)
from Designation c
WHERE c.commande.commandeTms IS NOT EMPTY
AND c.etatComde = 0) AS count1,
(select count(*)
from Designation c WHERE c.commande.commandeTms IS EMPTY ) AS count2
from Designation c GROUP BY c.commande.user.login";
I have got these results :
user1 user2
10 10
0 0
But I should have these ones :
user1 user2
4 2
3 1
Sample data:
table Commande
idComdeComm, commandeTms_idComndeTms, user_idUser
6, 17 2
8, NULL 2
10, 28 2
12, NULL 2
14, NULL 2
16, NULL 2
21, NULL 19
23, NULL 19
25, 26 19
31 NULL 19
table designation
idDesignation, designation, etatComde, commande_idComdeComm
5, 'fef', 0, 6
7, 'ferf', 0, 8
9, 'hrhrh', 0, 10
11, 'ujujuju', 0, 12
13, 'kikolol', 0, 14
15, 'ololo', 0, 16
20, 'gdfgfd', 0, 21
22, 'gdfgfdd', 0, 23
24, 'nhfn', 0, 25
30, 'momo', 0, 31
table user
idUser, login, password, profil
1, 'moez', '***', 'admin'
2, 'user1', '**', 'comm'
3, 'log', '**', 'log'
4, 'mo', '*', 'comm'
19, 'user2', '*', 'comm'
table Commande TMS
idComndeTms, etatOperationNumCMD, numeroComndeTms
17, '', 3131
26, '', 2525
28, '', 3333

Table and Sum function in Mathematica

I have a very simple question. I don't use Mathematica very often and I got stuck with one task. I need to compute this task:
T=5;
y (* it represents 54 numbers*);
h = 2;
c (*starting at 3, see below*);
Table[Sum[(y[[i]]*((i - c)/h)*((i - c)/h)), {i, T}]/
Sum[((i - c)/h)*((i - c)/h), {i, T}], {c, 3, 54, 2}]
I need to compute the "sum…/sum…" 26 times, where "c" starts at 3 and in another step it is (3+2)-> 5 and so on (e.g. 2 steps). I managed to implement this task with Table function.
The problem is, that I also need the "i" to go from 1 to 54, but in one step it should compute just 5 numbers: 1st computing i=1,2,3,4,5 ; 2nd computing i=3,4,5,6,7 and so on. In the sum function I implemented T as 5, so in first step everything is ok, but I have no idea how to create the loop where "i" overlaps like that. I hope that someone will understand my "great" explanation.
You could write T as c+2, but your table is too long, i.e.
z = Table[c, {c, 3, 54, 2}]
{3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53}
z + 2
{5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55}
So again, if you wrote T as c+2, (and minimum i as c-2) . . .
Table[Sum[(y[[i]]*((i - c)/h)*((i - c)/h)), {i, c - 2, c + 2}]/
Sum[((i - c)/h)*((i - c)/h), {i, c - 2, c + 2}], {c, 3, 54, 2}]
. . . you would need y to represent a list of 55 numbers, not 54.
For example, this works ok :-
y = Array[RandomInteger[10] &, 55];
Table[Sum[(y[[i]]*((i - c)/h)*((i - c)/h)), {i, c - 2, c + 2}]/
Sum[((i - c)/h)*((i - c)/h), {i, c - 2, c + 2}], {c, 3, 54, 2}]