How to get the mode of a text[] in SQL?

How to get the mode of a text[] in SQL? - sql

In a table, I have a column of type text[]. I want to extract the most frequent string in each row. How can I do that?
Trivial example:
id | fruit
----------------------------------
10 | ['apple','pear','apple']
20 | ['pear','pear','banana']
30 | ['pineapple','apple','apple']
After running the query I would like to have:
id | fruit | mode
-----------------------------------------
10 | ['apple','pear','apple'] | apple
20 | ['pear','pear','banana'] | pear
30 | ['pineapple','apple','apple']| apple

You can use a scalar sub-query after unnesting the elements:
select *,
(select mode() within group (order by u.word)
from unnest(u.fruit) as u(word)) as mode
from the_table t
This assumes that fruit is a text[] column. If it's a json or jsonb in reality, you need to use json_array_elements_text() instead of unnest.
If you need that a lot, you can create a function for that.

Related

Transpose single row with multiple columns into multiple rows of two columns

I have a SELECT query that works perfectly fine and it returns a single row with multiple named columns:
| registered | downloaded | subscribed | requested_invoice | paid |
|------------|------------|------------|-------------------|------|
| 9000 | 7000 | 5000 | 4000 | 3000 |
But I need to transpose this result to a new table that looks like this:
| type | value |
|-------------------|-------|
| registered | 9000 |
| downloaded | 7000 |
| subscribed | 5000 |
| requested_invoice | 4000 |
| paid | 3000 |
I have the additional module tablefunc enabled at PostgreSQL but I can't get the crosstab() function to work for this. What can I do?

You need the reverse operation of what crosstab() does. Some call it "unpivot". A LATERAL join to a VALUES expression should be the most elegant way:
SELECT l.*
FROM tbl -- or replace the table with your subquery
CROSS JOIN LATERAL (
VALUES
('registered' , registered)
, ('downloaded' , downloaded)
, ('subscribed' , subscribed)
, ('requested_invoice', requested_invoice)
, ('paid' , paid)
) l(type, value)
WHERE id = 1; -- or whatever
You may need to cast some or all columns to arrive at a common data type. Like:
...
VALUES
('registered' , registered::text)
, ('downloaded' , downloaded::text)
, ...
Related:
Postgres: convert single row to multiple rows (unpivot)
For the reverse operation - "pivot" or "cross-tabulation":
PostgreSQL Crosstab Query

Postgresql: Dynamic Regex Pattern

I have event data that looks like this:
id | instance_id | value
1 | 1 | a
2 | 1 | ap
3 | 1 | app
4 | 1 | appl
5 | 2 | b
6 | 2 | bo
7 | 1 | apple
8 | 2 | boa
9 | 2 | boat
10 | 2 | boa
11 | 1 | appl
12 | 1 | apply
Basically, each row is a user typing a new letter. They can also delete letters.
I'd like to create a dataset that looks like this, let's call it data
id | instance_id | value
7 | 1 | apple
9 | 2 | boat
12 | 1 | apply
My goal is to extract all the complete words in each instance, accounting for deletion as well - so it's not sufficient to just get the longest word or the most recently typed.
To do so, I was planning to do a regex operation like so:
select * from data
where not exists (select * from data d2 where d2.value ~ (d.value || '.'))
Effectively I'm trying to build a dynamic regex that adds matches one character more than is present, and is specific to the row it's matching against.
The code above doesn't seem to work. In Python, I can "compile" a regex pattern before I use it. What is the equivalent in PostgreSQL to dynamically build a pattern?

Try simple LIKE operator instead of regex patterns:
SELECT * FROM data d1
WHERE NOT EXISTS (
SELECT * FROM data d2
WHERE d2.value LIKE d1.value ||'_%'
)
Demo: https://dbfiddle.uk/?rdbms=postgres_9.6&fiddle=cd064c92565639576ff456dbe0cd5f39
Create an index on value column, this should speed up the query a bit.

To find peaks in the sequential data window functions is a good choice. You just need to compare each value with previous and next ones using lag() and lead() functions:
with cte as (
select
*,
length(value) > coalesce(length(lead(value) over (partition by instance_id order by id)),0) and
length(value) > coalesce(length(lag(value) over (partition by instance_id order by id)),length(value)) as is_peak
from data)
select * from cte where is_peak order by id;
Demo

Return column name and distinct values

Say I have a simple table in postgres as the following:
+--------+--------+----------+
| Car | Pet | Name |
+--------+--------+----------+
| BMW | Dog | Sam |
| Honda | Cat | Mary |
| Toyota | Dog | Sam |
| ... | ... | ... |
I would like to run a sql query that could return the column name in the first column and unique values in the second column. For example:
+--------+--------+
| Col | Vals |
+--------+--------+
| Car | BMW |
| Car | Toyota |
| Car | Honda |
| Pet | Dog |
| Pet | Cat |
| Name | Sam |
| Name | Mary |
| ... | ... |
I found a bit of code that can be used to return all of the unique values from multiple fields into one column:
-- Query 4b. (104 ms, 128 ms)
select distinct unnest( array_agg(a)||
array_agg(b)||
array_agg(c)||
array_agg(d) )
from t ;
But I don't understand the code well enough to know how to append the column name into another column.
I also found a query that can return the column names in a table. Maybe a sub-query of this in combination with the "Query 4b" shown above?

SQL Fiddle
SELECT distinct
unnest(array['car', 'pet', 'name']) AS col,
unnest(array[car, pet, name]) AS vals
FROM t
order by col

It's bad style to put set-returning functions in the SELECT list and not allowed in the SQL standard. Postgres supports it for historical reasons, but since LATERAL was introduced Postgres 9.3, it's largely obsolete. We can use it here as well:
SELECT x.col, x.val
FROM tbl, LATERAL (VALUES ('car', car)
, ('pet', pet)
, ('name', name)) x(col, val)
GROUP BY 1, 2
ORDER BY 1, 2;
You'll find this LATERAL (VALUES ...) technique discussed under the very same question on dba.SE you already found yourself. Just don't stop reading at the first answer.
Up until Postgres 9.4 there was still an exception: "parallel unnest" required to combine multiple set-returning functions in the SELECT list. Postgres 9.4 brought a new variant of unnest() to remove that necessity, too. More:
Unnest multiple arrays in parallel
The new function is also does not derail into a Cartesian product if the number of returned rows should not be exactly the same for all set-returning functions in the SELECT list, which is (was) a very odd behavior. The new syntax variant should be preferred over the now outdated one:
SELECT DISTINCT x.*
FROM tbl t, unnest('{car, pet, name}'::text[]
, ARRAY[t.car, t.pet, t.name]) AS x(col, val)
ORDER BY 1, 2;
Also shorter and faster than two unnest() calls in parallel.
Returns:
col | val
------+--------
car | BMW
car | Honda
car | Toyota
name | Mary
name | Sam
pet | Cat
pet | Dog
DISTINCT or GROUP BY, either is good for the task.

With JSON functions
row_to_json() and json_each_text() you can do it not specifying number and names of columns:
select distinct key as col, value as vals
from (
select row_to_json(t) r
from a_table t
) t,
json_each_text(r)
order by 1, 2;
SqlFiddle.

Flattening a relation with an array to emit one row per array entry

Given a table defined as such:
CREATE TABLE test_values(name TEXT, values INTEGER[]);
...and the following values:
| name | values |
+-------+---------+
| hello | {1,2,3} |
| world | {4,5,6} |
I'm trying to find a query which will return:
| name | value |
+-------+-------+
| hello | 1 |
| hello | 2 |
| hello | 3 |
| world | 4 |
| world | 5 |
| world | 6 |
I've reviewed the upstream documentation on accessing arrays, and tried to think about what a solution using the unnest() function would look like, but have been coming up empty.
An ideal solution would be easy to use even in cases where there were a significant number of columns other than the array being expanded and no primary key. Handling a case with more than one array is not important.

We can put the set-returning function unnest() into the SELECT list like Raphaël suggests. This used to exhibit corner case problems before Postgres 10. See:
What is the expected behaviour for multiple set-returning functions in SELECT clause?
Since Postgres 9.3 we can also use a LATERAL join for this. It is the cleaner, standard-compliant way to put set-returning functions into the FROM list, not into the SELECT list:
SELECT name, value
FROM tbl, unnest(values) value; -- implicit CROSS JOIN LATERAL
One subtle difference: this drops rows with empty / NULL values from the result since unnest() returns no row, while the same is converted to a NULL value in the FROM list and returned anyway. The 100 % equivalent query is:
SELECT t.name, v.value
FROM tbl t
LEFT JOIN unnest(t.values) v(value) ON true;
See:
What is the difference between LATERAL JOIN and a subquery in PostgreSQL?

Well, you give the data, the doc, so... let's mix it ;)
select
name,
unnest(values) as value
from test_values
see SqlFiddle

Splitting a string column in BigQuery

Let's say I have a table in BigQuery containing 2 columns. The first column represents a name, and the second is a delimited list of values, of arbitrary length. Example:
Name | Scores
-----+-------
Bob |10;20;20
Sue |14;12;19;90
Joe |30;15
I want to transform into columns where the first is the name, and the second is a single score value, like so:
Name,Score
Bob,10
Bob,20
Bob,20
Sue,14
Sue,12
Sue,19
Sue,90
Joe,30
Joe,15
Can this be done in BigQuery alone?

Good news everyone! BigQuery can now SPLIT()!
Look at "find all two word phrases that appear in more than one row in a dataset".
There is no current way to split() a value in BigQuery to generate multiple rows from a string, but you could use a regular expression to look for the commas and find the first value. Then run a similar query to find the 2nd value, and so on. They can all be merged into only one query, using the pattern presented in the above example (UNION through commas).

Trying to rewrite Elad Ben Akoune's answer in Standart SQL, the query becomes like this;
WITH name_score AS (
SELECT Name, split(Scores,';') AS Score
FROM (
(SELECT * FROM (SELECT 'Bob' AS Name ,'10;20;20' AS Scores))
UNION ALL
(SELECT * FROM (SELECT 'Sue' AS Name ,'14;12;19;90' AS Scores))
UNION ALL
(SELECT * FROM (SELECT 'Joe' AS Name ,'30;15' AS Scores))
))
SELECT name, score
FROM name_score
CROSS JOIN UNNEST(name_score.score) AS score;
And this outputs;
+------+-------+
| name | score |
+------+-------+
| Bob | 10 |
| Bob | 20 |
| Bob | 20 |
| Sue | 14 |
| Sue | 12 |
| Sue | 19 |
| Sue | 90 |
| Joe | 30 |
| Joe | 15 |
+------+-------+

If someone is still looking for an answer
select Name,split(Scores,';') as Score
from (
# replace the inner custome select with your source table
select *
from
(select 'Bob' as Name ,'10;20;20' as Scores),
(select 'Sue' as Name ,'14;12;19;90' as Scores),
(select 'Joe' as Name ,'30;15' as Scores)
);

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to get the mode of a text[] in SQL? - sql

Related

Transpose single row with multiple columns into multiple rows of two columns

Postgresql: Dynamic Regex Pattern

Return column name and distinct values

Flattening a relation with an array to emit one row per array entry

Splitting a string column in BigQuery

Categories

Resources