How to count array elements occurrences in Presto? - sql

I have an array in Presto and I'd like to count how many times each element occurs in it. For example, I have
[a, a, a, b, b]
and I'd like to get something like
{a: 3, b: 2}

We do not have a direct function for this, but you can combine UNNEST with histogram:
presto> SELECT histogram(x)
-> FROM UNNEST(ARRAY[1111, 1111, 22, 22, 1111]) t(x);
_col0
----------------
{22=2, 1111=3}
You may want to file a new issue for a direct function for this.

SELECT
TRANSFORM_VALUES(
MULTIMAP_FROM_ENTRIES(
TRANSFORM(ARRAY['a', 'a', 'a', 'b', 'b'], x -> ROW(x, 1))
),
(k, v) -> ARRAY_SUM(v)
)
Output:
{
"a": 3,
"b": 2
}

You can use REDUCE if there is no support of ARRAY_SUM:
SELECT
TRANSFORM_VALUES(
MULTIMAP_FROM_ENTRIES(
TRANSFORM(ARRAY['a', 'a', 'a', 'b', 'b'], x -> ROW(x, 1))
),
(k, v) -> REDUCE(v, 0, (s, x) -> s + x, s -> s)
)

In Presto 0.279, you now have a direct function for this purpose. You can easily use array_frequency. The input is your ARRAY, and the output is a MAP, where keys are the element of the given array and values are the frequencies. Fro example, if you run this SQL :
SELECT array_frequency(ARRAY[1,4,1,3,5,4,7,3,1])
The result will be
{
"1": 3,
"3": 2,
"4": 2,
"5": 1,
"7": 1
}

Related

How can I group elements of a list in Raku?

Is there some method in Raku which, when you pass it a "getter", groups together items from the original list for which the getter is returning the same value?
I am looking for something like groupBy in Scala:
# (1 until 10).groupBy(_ % 3)
res0: Map[Int, IndexedSeq[Int]] = HashMap(0 -> Vector(3, 6, 9), 1 -> Vector(1, 4, 7), 2 -> Vector(2, 5, 8))
Or groupBy from Lodash (JavaScript):
> groupBy(range(1, 10), x => x % 3)
{"0": [3,6,9], "1": [1,4,7], "2": [2,5,8]}
It's called classify in Raku:
$ raku -e 'say (1..10).classify(* % 3)'
{0 => [3 6 9], 1 => [1 4 7 10], 2 => [2 5 8]}

How to I modify arrays inside of a map in Kotlin

I am working with a map with strings as keys and arrays as values. I would like to adjust the map to be the original strings and change the arrays to the average values.
The original map is:
val appRatings = mapOf(
"Calendar Pro" to arrayOf(1, 5, 5, 4, 2, 1, 5, 4),
"The Messenger" to arrayOf(5, 4, 2, 5, 4, 1, 1, 2),
"Socialise" to arrayOf(2, 1, 2, 2, 1, 2, 4, 2)
)
What I have tried to do is:
val averageRatings = appRatings.forEach{ (k,v) -> v.reduce { acc, i -> acc + 1 }/v.size}
However this returns a Unit instead of a map in Kotlin. What am I doing wrong? I am working through a lambda assignment and they want us to use foreach and reduce to get the answer.
You can use forEach and reduce, but it's overkill, because you can just use mapValues and take the average:
val appRatings = mapOf(
"Calendar Pro" to arrayOf(1, 5, 5, 4, 2, 1, 5, 4),
"The Messenger" to arrayOf(5, 4, 2, 5, 4, 1, 1, 2),
"Socialise" to arrayOf(2, 1, 2, 2, 1, 2, 4, 2)
)
val averages = appRatings.mapValues { (_, v) -> v.average() }
println(averages)
Output:
{Calendar Pro=3.375, The Messenger=3.0, Socialise=2.0}
You can do this with mapValues function:
val appRatings = mapOf(
"Calendar Pro" to arrayOf(1, 5, 5, 4, 2, 1, 5, 4),
"The Messenger" to arrayOf(5, 4, 2, 5, 4, 1, 1, 2),
"Socialise" to arrayOf(2, 1, 2, 2, 1, 2, 4, 2)
)
val ratingsAverage = appRatings.mapValues { it.value.average() }
You already got some answers (including literally from JetBrains?? nice) but just to clear up the forEach thing:
forEach is a "do something with each item" function that returns nothing (well, Unit) - it's terminal, the last thing you can do in a chain, because it doesn't return a value to do anything else with. It's basically a for loop, and it's about side effects, not transforming the collection that was passed in and producing different data.
onEach is similar, except it returns the original item - so you call onEach on a collection, you get the same collection as a result. So this one isn't terminal, and you can pop it in a function chain to do something with the current set of values, without altering them.
map is your standard "transform items into other items" function - if you want to put a collection in and get a different collection out (like transforming arrays of Ints into single Int averages) then you want map. (The name comes from mapping values onto other values, translating them - which is why you always get the same number of items out as you put in)

Select within Structs within Arrays in SQL

I'm trying to find rows with N count of identifier A AND M count of identifier B in an array of structs within a Google BigQuery table, using the new Standard SQL. The data in the table (simplified) where each row looks a bit like this:
{
"Session": "abc123",
"Information" [
{
"Identifier": "A",
"Count": 1,
},
{
"Identifier": "B"
"Count": 2,
},
{
"Identifier": "C"
"Count": 3,
}
...
]
}
I've been struggling to work with the struct in an array. Any way I can do that?
Below is for BigQuery Standard SQL
#standardSQL
SELECT *
FROM `project.dataset.table`
WHERE 2 = (SELECT COUNT(1) FROM UNNEST(information) kv WHERE kv IN (('a', 5), ('b', 10)))
If to apply to dummy data as in below example
#standardSQL
WITH `project.dataset.table` AS (
SELECT 'abc123' session, [STRUCT('a' AS identifier, 1 AS `count`), ('b', 2), ('c', 3)] information UNION ALL
SELECT 'abc456', [('a', 5), ('b', 10), ('c', 20)]
)
SELECT *
FROM `project.dataset.table`
WHERE 2 = (SELECT COUNT(1) FROM UNNEST(information) kv WHERE kv IN (('a', 5), ('b', 10)))
result is
Row session information.identifier information.count
1 abc456 a 5
b 10
c 20

Mapping set of keys to a matching list of lists

What is an idiomatic way to map keys to a matching list of lists? An example - given:
val s = listOf(1, 9)
val u = listOf(listOf(1, 2, 3), listOf(1, 4, 7), listOf(1, 5, 9))
I would like to have a Map<Int, List<List<Int>>> such that every key in s is mapped to a list of lists containing that key:
{1=[ [1, 2, 3], [1, 4, 7], [1, 5, 9] ], 9=[ [1, 5, 9] ]}
The following:
s.groupBy({ it }, { x -> u.filter { it.contains(x) } })
produces:
{1=[[[1, 2, 3], [1, 4, 7], [1, 5, 9]]], 9=[[[1, 5, 9]]]}
which is not quite right and it isn't clear how to flatten the result to the expected shape.
I would recommend associateWith and use it like this:
s.associateWith { num -> u.filter { list -> num in list } }
Output:
{1=[[1, 2, 3], [1, 4, 7], [1, 5, 9]], 9=[[1, 5, 9]]}
I recommended associate at first, but you can shorten the code even further if you use associateWith. Thanks to Abhay Agarwal who recommended it.
Update
You just need to flatten the values of the result Map.
val w = s.groupBy({ it }, { x -> u.filter { it.contains(x) } })
.mapValues { it.value.flatten() }
My solution map the first collection to pairs from each element to the list where it appears, and then groupBy the result list.
Example
val w = s.map { elem -> Pair(elem, u.filter { list -> elem in list }) }
.groupBy ({ it.first }, { it.second })
.mapValues { it.value.flatten() }
check(w[1] == listOf(listOf(1, 2, 3), listOf(1, 4, 7), listOf(1, 5, 9)))
check(w[9] == listOf(listOf(1, 5, 9)))
println(w)
Output
{1=[[1, 2, 3], [1, 4, 7], [1, 5, 9]], 9=[[1, 5, 9]]}
Idiomatic to me would be s.groupBy(....) The answer by #Omar Mainegra - s.groupBy(...).mapValues( flatten ) absolutely works but it looks like a hack where the initial result needs some extra massaging.
The issue is with the implementation of groupBy and more specifically with groupByTo:
public inline fun <T, K, V, M : MutableMap<in K, MutableList<V>>> Iterable<T>.groupByTo(destination: M, keySelector: (T) -> K, valueTransform: (T) -> V): M {
for (element in this) {
val key = keySelector(element)
val list = destination.getOrPut(key) { ArrayList<V>() }
list.add(valueTransform(element))
}
return destination
}
The implementation wraps the values associated with a key in a list because in general multiple values can be associated with a key which is not the case here
where values in s are unique which means that groupBy is the wrong function to use. The right function is associateWith:
s.associateWith { x -> u.filter { it.contains(x) } }
produces:
{1=[[1, 2, 3], [1, 4, 7], [1, 5, 9]], 9=[[1, 5, 9]]}

In PostgreSQL, what's the best way to select an object from a JSONB array?

Right now, I have an an array that I'm able to select off a table.
[{"_id": 1, "count: 3},{"_id": 2, "count: 14},{"_id": 3, "count: 5}]
From this, I only need the count for a particular _id. For example, I need the count for
_id: 3
I've read the documentation but I haven't been able to figure out the correct way to get the object.
WITH test_array(data) AS ( VALUES
('[
{"_id": 1, "count": 3},
{"_id": 2, "count": 14},
{"_id": 3, "count": 5}
]'::JSONB)
)
SELECT val->>'count' AS result
FROM
test_array ta,
jsonb_array_elements(ta.data) val
WHERE val #> '{"_id":3}'::JSONB;
Result:
result
--------
5
(1 row)