Count the number of matches in an array in BigQuery - sql

How I can count the number of matches in an array? For example, for numbers [1,3] in the array [1,2,3] there will be 2 matches, and for the array [1,2] there will be 1 match. Right now I can only check if [1,3] is in the array or not.
WITH `arrays` AS (
SELECT 1 id, [1,2,3] as arr
UNION ALL
SELECT 2, [1,2]
UNION ALL
SELECT 3, [3]
)
SELECT id, arr, [1,3] as numbers,
CASE
1 IN UNNEST(arr) and
3 IN UNNEST(arr)
WHEN TRUE THEN 'numbers is in array'
ELSE 'numbers is not in array'
END conclusion
FROM `arrays`
I'm trying to get such result:

Using a math, following seems to be possible:
If union of arr and numbers is same as arr, it will be numbers is in array
If union of arr and numbers is greater than arr, elements as much as the increased number is not in the arr.
So, numbers_len - (union_len - arr_len) will be check
WITH `arrays` AS (
SELECT 1 id, [1,2,3] as arr
UNION ALL
SELECT 2, [1,2]
UNION ALL
SELECT 3, [3]
),
calculated_arrays AS (
SELECT *, [1,3] as numbers,
ARRAY_LENGTH(ARRAY(SELECT DISTINCT * FROM UNNEST(arr || [1, 3]))) AS union_len,
ARRAY_LENGTH(arr) AS arr_len,
ARRAY_LENGTH([1, 3]) AS numbers_len
FROM `arrays`
)
SELECT id, arr, numbers,
numbers_len - union_len + arr_len AS check,
IF (union_len = arr_len, 'numbers is in array', 'numbers is not in array') AS conclusion
FROM calculated_arrays
;
output:

Consider below approach
with `arrays` as (
select 1 id, [1,2,3] as arr union all
select 2, [1,2] union all
select 3, [3]
)
select *,
( select count(*)
from t.numbers num join t.arr num
using(num)
) check,
( select format('number is %sin array',
if(logical_and(if(num2 is null, false, true)), '', 'not '))
from t.numbers num1 left join t.arr num2
on num1 = num2
) conclusion
from (
select id, arr, [1,3] as numbers
from `arrays`
) t
with output

Related

Query rows where first_name contains at least 2 vowels, and the number of occurences of each vowel is equal

I have the following problem: Show all rows in table where column first_name contains at least 2 vowels (a, e, i, o, u), and the number of occurences of each vowel is the same.
Valid example: Alexander, "e" appears 2 times, "a" appears 2 times. That is coreect.
Invalid example: Jonathan, it has 2 vowels (a, o), but "o" appears once, and "a" appears twice, the number of occurences is not equal.
I've solved this problem by calculating each vowel, and then verify every case (A E, A I, A O etc. Shortly, each combination of 2, 3, 4, 5). With that solution, I have a very long WHERE. Is there any shorter way and more elegant and simple?
This is how I solved it in TSQL in MS SQL Server 2019.
I know its not exactly what you wanted. Just an interesting thing to try. Thanks for that.
DROP TABLE IF EXISTS #Samples
SELECT n.Name
INTO #Samples
FROM
(
SELECT 'Ravi' AS Name
UNION
SELECT 'Tim'
UNION
SELECT 'Timothe'
UNION
SELECT 'Ian'
UNION
SELECT 'Lijoo'
UNION
SELECT 'John'
UNION
SELECT 'Jami'
) AS n
SELECT g.Name,
IIF(MAX (g.Repeat) = MIN (g.Repeat) AND SUM (g.Appearance) >= 2, 'Valid', 'Invalid') AS Validity
FROM
(
SELECT v.value,
s.Name,
SUM (LEN (s.Name) - LEN (REPLACE (s.Name, v.value, ''))) AS Repeat,
SUM (IIF(s.Name LIKE '%' + v.value + '%', 1, 0)) AS Appearance
FROM STRING_SPLIT('a,e,i,o,u', ',') AS v
CROSS APPLY #Samples AS s
GROUP BY v.value,
s.Name
) AS g
WHERE g.Repeat > 0
GROUP BY g.Name
Output
we can replace STRING_SPLIT with a temp table for supporting lower versions
DROP TABLE IF EXISTS #Vowels
SELECT C.Vowel
INTO #Vowels
FROM
(
SELECT 'a' AS Vowel
UNION
SELECT 'e'
UNION
SELECT 'i'
UNION
SELECT 'o'
UNION
SELECT 'u'
) AS C
SELECT g.Name,
IIF(MAX (g.Repeat) = MIN (g.Repeat) AND SUM (g.Appearance) >= 2, 'Valid', 'Invalid') AS Validity
FROM
(
SELECT v.Vowel,
s.Name,
SUM (LEN (s.Name) - LEN (REPLACE (s.Name, v.Vowel, ''))) AS Repeat,
SUM (IIF(s.Name LIKE '%' + v.Vowel + '%', 1, 0)) AS Appearance
FROM #Vowels AS v
CROSS APPLY #Samples AS s
GROUP BY v.Vowel,
s.Name
) AS g
WHERE g.Repeat > 0
GROUP BY g.Name
From Oracle 12, you can use:
SELECT name
FROM table_name
CROSS JOIN LATERAL(
SELECT 1
FROM (
-- Step 2: Count the frequency of each vowel
SELECT letter,
COUNT(*) As frequency
FROM (
-- Step 1: Find all the vowels
SELECT REGEXP_SUBSTR(LOWER(name), '[aeiou]', 1, LEVEL) AS letter
FROM DUAL
CONNECT BY LEVEL <= REGEXP_COUNT(LOWER(name), '[aeiou]')
)
GROUP BY letter
)
-- Step 3: Filter out names where the number of vowels are
-- not equal or the vowels do not occur at least twice
-- and there are not at least 2 different vowels.
HAVING MIN(frequency) >= 2
AND MIN(frequency) = MAX(frequency)
AND COUNT(*) >= 2
);
Which, for the sample data:
CREATE TABLE table_name (name) AS
SELECT 'Alexander' FROM DUAL UNION ALL
SELECT 'Johnaton' FROM DUAL UNION ALL
SELECT 'Anna' FROM DUAL;
Outputs:
NAME
Alexander
db<>fiddle here

Recursive Formula based on previous Row's Result

Let's consider the following query:
with
init as (
select 0.1 as y0
),
cte as (
select 1 as i, 1 as x -- x_1
union all
select 2 as i, 10 as x -- x_2
union all
select 3 as i, 100 as x -- x_3
order by i asc
)
select cte.x, init.y0 -- <- ?
from cte
join init
on true
There is a CTE init specifying an inital value y_0 and a CTE cte specifying rows with a value x and an index i.
My question is whether I can write a select which realizes the following simple, recursive formula.
y_n+1 = y_n + x_n+1
So, the result should be 3 rows with values: 1.1, 11.1, 111.1 (for y_1, y_2, y_3).
Would that be possible?
write a select which realizes the following simple, recursive formula.
y_n+1 = y_n + x_n+1
Consider below
select x, y0 + sum(x) over(order by i) as y
from cte, init
if applied to sample data in your question - output is
Note: the expected result you shown in your question - does not match the formula you provided - so obviously above output is different from one in your question :o)
You need to use the “OVER” statement. You can see more documentation about the syntax.
with
init as (
select 0.1 as y0
),
cte as (
select 1 as ts, 1 as i, 1 as x -- x_1
union all
select 2, 2, 10 as x -- x_2
union all
select 3, 3, 100 as x
union all
select 4, 4, 109 as x -- x_3
union all
select 5, 5, 149 as x
order by i asc
)
SELECT *,init.y0 + SUM(i) OVER(
ORDER BY (ts)
) AS res
FROM cte join init
on true

How to aggregate arrays element by element in BigQuery?

In BigQquery how can I aggregate arrays element by element ?
For instance if I have this table
id
array_value
1
[1, 2, 3]
2
[4, 5, 6]
3
[7, 8, 9]
I want to sum all the vector element-wise and output [1+4+7, 2+5+8, 3+6+9] = [12, 15, 18]
I can SUM float fields with SELECT SUM(float_field) FROM table but when I try to apply the SUM on an array I get
No matching signature for aggregate function SUM for argument types: ARRAY.
Supported signatures: SUM(INT64); SUM(FLOAT64); SUM(NUMERIC); SUM(BIGNUMERIC) at [1:8]
I have found ARRAY_AGG in the doc but it is not what I want: it just creates an array from values.
I think you want:
select array_agg(sum_val order by id) as res
from (
select idx, sum(val) as sum_val
from mytable t
cross join unnest(t.array_value) as val with offset as idx
group by idx
) t
I think you want:
select array_agg(sum_val)
from (select (select sum(val)
from unnest(t.array_value) val
) as sum_val
from t
) x
I think technically you simply refer to the individual values in the arrays using offset() or safe_offset() in case there might be missing values
-- example data
with temp as (
select * from unnest([
struct(1 as id, [1, 2, 3] as array_value),
(2, [4,5,6]),
(3, [7,8])
])
)
-- actual query
select
[
SUM( array_value[safe_offset(0)] ),
SUM( array_value[safe_offset(1)] ),
SUM( array_value[safe_offset(2)] )
] as result_array
from temp
I put them in a result array, but you don't have to do that. I had the last array missing one value to show that the query doesn't break. If you want it to break you should use offset() without the 'safe_'
Below is for BigQuery Standard SQL
select array_agg(val order by offset)
from (
select offset, sum(val) as val
from `project.dataset.table` t,
unnest(array_value) as val with offset
group by offset
)

Convert String to Tuple in BigQuery

I have a variable passed as an argument in BigQuery which is in the format "('a','b','c')"
with vars as (
select "{0}" as var1,
)
-- where, {0} = "('a','b','c')"
To use it in BigQuery I need to make it a tuple ('a','b','c').
How can it be done?
Any alternate approach is also welcome.
Example:
with vars as (
select "('a','b','c')" as index
)
select * from `<some_other_db>.table` where index in (
select index from vars)
-- gives me empty results because index is now a string
Present output:
select * from <db_name>.table where index in "('a','b','c')"
Required output:
select * from <db_name>.table where index in ('a','b','c')
Below is for BigQuery Standard SQL
#standardSQL
WITH vars AS (
SELECT "('a','b','c')" AS var
)
SELECT *
FROM `<some_other_db>.table`
WHERE index IN UNNEST((
SELECT SPLIT(REGEXP_REPLACE(var, r'[()\']', ''))
FROM vars
))
You can test, play with above using some dummy data as in below example
#standardSQL
WITH vars AS (
SELECT "('a','b','c')" AS var
), `<some_other_db>.table` AS (
SELECT 1 id, 'a' index UNION ALL
SELECT 2, 'd' UNION ALL
SELECT 3, 'c' UNION ALL
SELECT 4, 'e'
)
SELECT *
FROM `<some_other_db>.table`
WHERE index IN UNNEST((
SELECT SPLIT(REGEXP_REPLACE(var, r'[()\']', ''))
FROM vars
))
with output
Row id index
1 1 a
2 3 c
I think this does what you are asking for:
with vars as ( select "('a','b','c')" as var1)
select as struct
MAX(CASE WHEN n = 0 then var END) as f1,
MAX(CASE WHEN n = 1 then var END) as f2,
MAX(CASE WHEN n = 2 then var END) as f3
from vars v cross join
unnest(SPLIT(REPLACE(REPLACE(var1, '(', ''), ')', ''), ',')) var with offset n;

How do I create a list of all possible anagrams of a word/string in PostgreSQL

How do I create a list of all possible anagrams of a word/string in PostgreSQL.
For example if String is 'act'
then the desired output should be:
act,
atc,
cta,
cat,
tac,
tca
I have one Table 'tbl_words' which contains million of words.
Then I want to check/search for only valid words in my database table from this anagrams list.
Like from above list of anagrams valid words are : act, cat.
Is there any way to do this?
Update 1:
I need output like this:
(all permutation for given word )
any idea ??
The query generates all permutations of 3 elements set:
with recursive numbers as (
select generate_series(1, 3) as i
),
rec as (
select i, array[i] as p
from numbers
union all
select n.i, p || n.i
from numbers n
join rec on cardinality(p) < 3 and not n.i = any(p)
)
select p as permutation
from rec
where cardinality(p) = 3
order by 1
permutation
-------------
{1,2,3}
{1,3,2}
{2,1,3}
{2,3,1}
{3,1,2}
{3,2,1}
(6 rows)
Modify the final query to generate permutations of the letters of a given word:
with recursive numbers as (
select generate_series(1, 3) as i
),
rec as (
select i, array[i] as p
from numbers
union all
select n.i, p || n.i
from numbers n
join rec on cardinality(p) < 3 and not n.i = any(p)
)
select a[p[1]] || a[p[2]] || a[p[3]] as result
from rec
cross join regexp_split_to_array('act', '') as a
where cardinality(p) = 3
order by 1
result
--------
act
atc
cat
cta
tac
tca
(6 rows)
Here is a solution:
with recursive params as (
select *
from (values ('cata')) v(str)
),
nums as (
select str, 1 as n
from params
union all
select str, 1 + n
from nums
where n < length(str)
),
pos as (
select str, array[n] as poses, array_remove(array_agg(n) over (partition by str), n) as rests, 1 as lev
from nums
union all
select pos.str, array_append(pos.poses, nums.n), array_remove(rests, nums.n), lev + 1
from pos join
nums
on pos.str = nums.str and array_position(pos.rests, nums.n) > 0
where cardinality(rests) > 0
)
select distinct pos.str , string_agg(substr(pos.str, thepos, 1), '')
from pos cross join lateral
unnest(pos.poses) thepos
where cardinality(rests) = 0
group by pos.str, pos.poses;
This is quite tricky, particularly when there are repeated letters in the string. The approach taken here generates all permutations of the numbers from 1 to n, where n is the length of the string. It then uses these as indexes to extract characters from the original string.
Those who are keen will notice that this uses select distinct with group by. That seems like the easiest way to avoid duplication in the resultant strings.