Oracle SQL -- decode to multiple elements - sql

I might make a join condition like:
table1.value_1 = decode(table2.encoded_value,1,'a',2,'b',3,'c')
But what if I need to do an in-type join?
e.g.
How could I write a decode to accomplish the functionality of, say:
table1.value_1 in
decode(table2.encoded_value,
1,
'a',
2,
'b',
3,
'c',
4,
('d','e','f')
);
This syntax is not valid, but the idea is that in the case where table2.encoded_value is 4, the condition would become table1.value_1 in ('d','e','f').

You generally wouldn't use a function. You'd just use boolean logic
WHERE (table2.encoded_value = 1 AND table1.value_1 = 'a')
OR (table2.encoded_value = 2 AND table1.value_1 = 'b')
OR (table2.encoded_value = 3 AND table1.value_1 = 'c')
OR (table2.encoded_value = 4 AND table1.value_1 in( 'd', 'e', 'f'))

I don't think this is possible, you should instead change the algorithm of your query.
Instead of checking if value_1 is in a range, you could change your algorithm like this:
(table2.encoded_value <> 4
and table1.value_1 = decode(table2.encoded_value,
1, 'a',
2, 'b',
3, 'c'))
or table2.encoded_value = 4 and table1.value_1 in ('d', 'e', 'f'/*Or a select of multiple values*/)

Related

How to get all overlapping (ordered) 3-tuples from an array in BigQuery

Given a table like the following
elems
['a', 'b', 'c', 'd', 'e']
['v', 'w', 'x', 'y']
I'd like to transform it into something like this:
tuple
['a', 'b', 'c']
['b', 'c', 'd']
['c', 'd', 'e']
['v', 'w', 'x']
['w', 'x', 'y']
I.e., I'd like to get all overlapping 3-tuples.
My current attempt looks as follows:
WITH foo AS (
SELECT ['a', 'b', 'c', 'd', 'e'] AS elems UNION ALL
SELECT ['v', 'w', 'x', 'y']),
single AS (
SELECT * FROM
foo,
UNNEST(elems) elem
),
tuples AS (
SELECT ARRAY_AGG(elem) OVER (ROWS BETWEEN 2 PRECEDING AND 0 FOLLOWING) AS tuple
FROM single
)
SELECT * FROM tuples
WHERE ARRAY_LENGTH(tuple) >= 3
But the problem is, it returns some unwanted rows too, i.e., the ones that are "between" the original rows from the foo table.
tuple
['a', 'b', 'c']
['b', 'c', 'd']
['c', 'd', 'e']
['d', 'e', 'v'] <--- unwanted
['e', 'v', 'w'] <--- unwanted
['v', 'w', 'x']
['w', 'x', 'y']
Also, is it guaranteed, that the order of rows in single is correct, or does it only work in my minimal example by chance, because of the low cardinality? (I guess there may be a simple solution without this step in between.)
Consider below approach
select [elems[offset(index - 1)], elems[offset(index)], elems[offset(index + 1)]] as tuple
from your_table, unnest([array_length(elems)]) len,
unnest(generate_array(1, len - 2)) index
if applied to sample data in your question - output is
You might consider below query.
Also, is it guaranteed, that the order of rows in single is correct, or does it only work in my minimal example by chance, because of the low cardinality?
afaik, it's not quaranteeded without explicit using WITH OFFSET in the query.
WITH foo AS (
SELECT ['a', 'b', 'c', 'd', 'e'] AS elems UNION ALL
SELECT ['v', 'w', 'x', 'y']),
single AS (
SELECT * FROM
foo,
UNNEST(elems) elem WITH OFFSET
),
tuples AS (
SELECT ARRAY_AGG(elem) OVER (PARTITION BY FORMAT('%t', elems) ORDER BY offset ROWS BETWEEN 2 PRECEDING AND 0 FOLLOWING) AS tuple
FROM single
)
SELECT * FROM tuples
WHERE ARRAY_LENGTH(tuple) >= 3;
Just to give you another idea
create temp function slice(arr ARRAY<string>, pos float64, len float64)
returns array<string> language js as
r"return arr.slice(pos, pos + len);";
select slice(elems, index, 3) as tuple
from foo, unnest([array_length(elems)]) len,
unnest(generate_array(0, len - 3)) index
leaving it up to you to refactor above query to the point when it will look something like
select tuple
from foo, unnest(slices(elems, 3)) as tuple

How to compute cosine similarity between two texts in presto?

Hello everyone: I wanted to use COSINE_SIMILARITY in Presto SQL to compute the similarity between two texts. Unfortunately, COSINE_SIMILARITY does not take the texts as the inputs; it takes maps instead. I am not sure how to convert the texts into those maps in presto. I want the following, if we have a table like this:
id
text1
text2
1
a b b
b c
Then we can compute the cosine similarity as:
COSINE_SIMILARITY(
MAP(ARRAY['a', 'b', 'c'], ARRAY[1, 2, 0]),
MAP(ARRAY['a', 'b', 'c'], ARRAY[0, 1, 1])
)
i.e., two texts combined has three words: 'a', 'b', and 'c'; text1 has 1 count of 'a', 2 counts of 'b', and 0 count of 'c', which goes as the first MAP; similarly, text2 has 0 count of 'a', 1 count of 'b', and 1 count of 'c', which goes as the second MAP.
The final table should look like this:
id
text1
text2
all_unique_words
map1
map2
similarity
1
a b b
b c
[a b c]
[1, 2, 0]
[0, 1, 1]
0.63
How can we convert two texts into two such maps in presto? Thanks in advance!
Use split to transform string into array and then depended on Presto version either use unnest+histogram trick or array_frequency:
-- sample data
with dataset(id, text1, text2) as (values (1, 'a b b', 'b c'))
-- query
select id, COSINE_SIMILARITY(histogram(t1), histogram(t2))
from dataset,
unnest (split(text1, ' '), split(text2, ' ')) as t(t1, t2)
group by id;
Output:
id
_col1
1
0.6324555320336759

Using lookup table vs. values for 'else' case when the list of possibilities is not defined

I am trying to design a lookup table, but I do not know how to design storage for values that do not match.
Example: for lookup values 'A', 'B' and 'C' I want lookup value 1, but for all the others (and the list is not defined) I want a value -1.
using case I would have written
CASE
WHEN VALUE IN('A','B','C') THEN 1
ELSE -1
END
But how to do the same using a lookup table?
You can use left join:
SELECT t.*,
COALESCE(v.result, -1)
FROM t LEFT JOIN
(VALUES ('A', 1), ('B', 1), ('C', 1)) v(value, result)
ON t.value = v.value;
You can include a default value as well, say using NULL:
WITH reference AS (
SELECT *
FROM (VALUES ('A', 1), ('B', 1), ('C', 1), (NULL, -1)
) v(value, result)
)
SELECT t.*,
COALESCE(r.result, r_default.result) as result
FROM t LEFT JOIN
reference r
ON r.value = t.value LEFT JOIN
reference r_default
ON r_default.value IS NULL;

Extract last N elements of an array in SQL (hive)

I have a column with arrays and I want to extract the X last elements in an array.
Example trying to extract the last two elements:
Column A
['a', 'b', 'c']
['d', 'e']
['f', 'g', 'h', 'i']
Expected output:
Column A
['b', 'c']
['d', 'e']
['h', 'i']
Best case scenario would be to do it without using a UDF
One method using reverse, explode, filtering and re-assembling array again:
with your_table as (
select stack (4,
0, array(), --empty array to check it works if no elements or less than n
1, array('a', 'b', 'c'),
2, array('d', 'e'),
3, array('f', 'g', 'h', 'i')
) as (id, col_A)
)
select s.id, collect_list(s.value) as col_A
from
(select s.id, a.value, a.pos
from your_table s
lateral view outer posexplode(split(reverse(concat_ws(',',s.col_A)),',')) a as pos, value
where a.pos between 0 and 1 --last two (use n-1 instead of 1 if you want last n)
distribute by s.id sort by a.pos desc --keep original order
)s
group by s.id
Result:
s.id col_a
0 []
1 ["b","c"]
2 ["d","e"]
3 ["h","i"]
More elegant way using brickhouse numeric_range UDF in this answer

Is there any way to order the result set by what you want in SQL Server?

Is there any way to select from SQL Server by 'Queue, serie ...'.
For example I want to get some rows by using identifier.
I want to get rows ordered by like C, D, A, F
SELECT *
FROM BRANCH
WHERE IDENTIFIER IN ('C', 'D', 'A', 'F')
And this query turns rows order by random.
Maybe ordered as
'F', 'D', 'A', 'C'
'A', 'B', 'C', 'D'
How can I get the result set ordered as 'C', 'D', 'A', 'F'? I need this using for for xml path usage.
SELECT b.*
FROM dbo.BRANCH b
JOIN (
VALUES
(1, 'C'),
(2, 'D'),
(3, 'A'),
(4, 'F')
) c(ID, IDENTIFIER) ON c.IDENTIFIER = b.IDENTIFIER
ORDER BY c.ID