How do I accomplish "cascading grouping" of columns in ANSI SQL? - sql

I have a Presto SQL table that looks something like this:
|tenant|type|environment |
| |
| X | A |http:a.b.c(foo)/http:a.b.c(bar)/http:a.b.c(baz)|
| X | A |http:d.e.f(foo)/http:d.e.f(bar)/http:d.e.f(baz)|
| X | A |http:g.h.i(foo) |
| X | B |http:g.h.i(foo)/http:g.h.i(bar) |
All columns are of type string.
I need to produce output that counts each environment type (foo, bar, or baz) per tenant and type. I.e. the above data should be listed somewhat like this:
X A foo 3
bar 2
baz 2
X B foo 1
bar 1
I've been trying queries like this:
SELECT "tenant_id", "type_id", "environment", count(*) AS total_count
FROM "tenant_table"
WHERE "environment" LIKE '%foo%'
GROUP BY "tenant_id", "type_id", "environment";
But I'm not getting the output I need. I do have a little bit of flexibility of changing the data types. The data comes from a CSV file originally. For example, if it makes things easier to redefine the type of the "environment" column to something like an array, that is an option. Any help in resolving this would be greatly appreciated. Thanks.

If that's a fixed list of values, with at most 1 occurance per string, the, you can put it in a derived table and use like to search for matches:
select t.tenant, t.type, v.val, count(*) cnt
from tenant_db t
inner join (values ('foo'), ('bar'), ('baz')) v(val)
on t.environment like '%' || v.val || '%'
group by t.tenant, t.type, v.val
Depending on your requirement, you might want to narrow the search criteria in order to avoid fake positives; maybe using the parentheses:
on t.environment like '%(' || v.val || ')%'
Or using a regex.

You can extract the values with regexp_extract_all and use UNNEST to "flatten" the resulting arrays before computing the aggregation:
WITH data(tenant, type, environment) AS (
VALUES
('X', 'A', 'http:a.b.c(foo)/http:a.b.c(bar)/http:a.b.c(baz)'),
('X', 'A', 'http:d.e.f(foo)/http:d.e.f(bar)/http:d.e.f(baz)'),
('X', 'A', 'http:g.h.i(foo)'),
('X', 'B', 'http:g.h.i(foo)/http:g.h.i(bar)')
)
SELECT tenant, type, value, count(*)
FROM data, UNNEST(regexp_extract_all(data.environment, '\(([^\)]+)\)', 1)) t(value)
GROUP BY tenant, type, value
produces:
tenant | type | value | _col3
--------+------+-------+-------
X | A | baz | 2
X | A | bar | 2
X | A | foo | 3
X | B | bar | 1
X | B | foo | 1

Related

How can I summarise this JSON array data as columns on the same row?

I have a PostgreSQL database with a table called items which includes a JSONB field care_ages.
This field contains an array of between one and three objects, which include these keys:
AgeFrom
AgeTo
Number
Register
For a one-off audit report I need to run on this table, I need to "unpack" this field into columns on the same row.
I've used jsonb_to_recordset to split it out into rows and columns, which gets me halfway:
SELECT
items.id,
items.name,
care_ages.*
FROM
ofsted_items,
jsonb_to_recordset(items.care_age) AS care_ages ("AgeFrom" integer, "AgeTo" integer, "Register" text, "MaximumNumber" integer)
This gives me output like:
| id | name | register | age_from | age_to | maximum_number |
|----|--------------|----------|----------|--------|----------------|
| 1 | namey mcname | xyz | 0 | 4 | 5 |
| 1 | namey mcname | abc | 4 | 8 | 7 |
Next, I need to combine these rows together, perhaps using GROUP BY, adding extra columns, like this:
| id | name | register_xyz? | xyz_age_from | xyz_age_to | xyz_maximum_number | register_abc? | abc_age_from | abc_age_to | abc_maximum_number |
|----|--------------|---------------|--------------|------------|--------------------|---------------|--------------|------------|--------------------|
| 1 | namey mcname | true | 0 | 4 | 5 | true | 4 | 8 | 7 |
Because I know ahead of time which "registers" there are (there's only three of them), it seems like this should be possible.
I've tried following this example, using CASE to calculate extra columns, but I'm not getting any useful values: all 0s and 5s for some reason.
If you are using Postgres 12 or later, you can use a jsonpath query to first extract the JSON object for each register into separate columns. Then use the "usualy" operators to extract the keys. This avoids first expanding into multiple rows just to aggregate them back into a single row later
select id, name,
(reg_xyz ->> 'AgeFrom')::int as xyz_age_from,
(reg_xyz ->> 'AgeTo')::int as xyz_age_to,
(reg_xyz ->> 'Number')::int as xyz_max_num,
(reg_abc ->> 'AgeFrom')::int as abc_age_from,
(reg_abc ->> 'AgeTo')::int as abc_age_to,
(reg_abc ->> 'Number')::int as abc_max_num
from (
select id, name,
jsonb_path_query_first(care_age, '$[*] ? (#.Register == "xyz")') as reg_xyz,
jsonb_path_query_first(care_age, '$[*] ? (#.Register == "abc")') as reg_abc
from ofsted_items
) t
At one point or another you will have to explicitly write out one expression for each column, so jsonb_to_recordset doesn't really buy you that much.
Online example
If you need this a lot, you can easily put this into a view.
Try below query:
select id,name,
max(case when register='xyz' then true end) as "register_xyz?",
max(case when register='xyz' then age_from end) as xyz_age_from,
max(case when register='xyz' then age_to end) as xyz_age_to,
max(case when register='xyz' then maximum_number end) as xyz_maximum_number,
max(case when register='abc' then true end) as "register_abc?",
max(case when register='abc' then age_from end) as abc_age_from,
max(case when register='abc' then age_to end) as abc_age_to,
max(case when register='abc' then maximum_number end) as abc_maximum_number
from (SELECT
items.id,
items.name,
care_ages.*
FROM
ofsted_items,
jsonb_to_recordset(items.care_age) AS care_ages ("AgeFrom" integer, "AgeTo" integer, "Register" text, "MaximumNumber" integer)
)t
group by id, name
You can use conditional aggregation to pivot your table. This can be done using the CASE clause as it was done at the solution you already linked or using the FILTER clause:
demo:db<>fiddle
SELECT
id,
name,
bool_and(true) FILTER (WHERE register = 'xyz') as "register_xyz?",
MAX(age_from) FILTER (WHERE register = 'xyz') as xyz_age_from,
MAX(age_to) FILTER (WHERE register = 'xyz') as xyz_age_to,
MAX(maximum_number) FILTER (WHERE register = 'xyz') as xyz_maximum_number,
bool_and(true) FILTER (WHERE register = 'abc') as "register_abc?",
MAX(age_from) FILTER (WHERE register = 'abc') as abc_age_from,
MAX(age_to) FILTER (WHERE register = 'abc') as abc_age_to,
MAX(maximum_number) FILTER (WHERE register = 'abc') as abc_maximum_number
FROM items,
jsonb_to_recordset(items.care_ages) AS care_ages ("age_from" integer, "age_to" integer, "register" text, "maximum_number" integer)
GROUP BY id, name

Is it possible to map values onto a table given corresponding row and column indices in SQL?

I have a SQL table in the form of:
| value | row_loc | column_loc |
|-------|---------|------------|
| a | 0 | 1 |
| b | 1 | 1 |
| c | 1 | 0 |
| d | 0 | 0 |
I would like to find a way to map it onto a table/grid, given the indices, using SQL. Something like:
| d | a |
| c | b |
(The context being, I would like to create a colour map with colours corresponding to values a, b, c, d, in the locations specified)
I would be able to do this iteratively in python, but cannot figure out how to do it in SQL, or if it is even possible! Any help or guidance on this problem would be greatly appreciated!
EDIT: a, b, c, d are examples of numeric values (which would not be able to be selected using named variables in practice, so I'm relying on selecting them based on location. Also worth noting, the number of rows and columns will always be the same. The value column is also not the primary key to this table, so is not necessarily unique, it is just as a continuous value.
Yes, it is possible, assuming the column number is limited since SQL supports only determined number of columns. The number of rows in result set depends on number of distinct row_loc values so we have to group by column row_loc. Then choose value using simple case.
with t (value, row_loc, column_loc) as (
select 'a', 0, 1 from dual union all
select 'b', 1, 1 from dual union all
select 'c', 1, 0 from dual union all
select 'd', 0, 0 from dual
)
select max(case column_loc when 0 then value else null end) as column0
, max(case column_loc when 1 then value else null end) as column1
from t
group by row_loc
order by row_loc
I tested it on Oracle. Not sure what to do if multiple values match on same coordinate, I chose max. For different vendors you could also utilize special clauses such as count ... filter (where ...). Or the Oracle pivot clause can also be used.

Counting SQLite rows that might match multiple times in a single query

I have a SQLite table which has a column containing categories that each row may fall into. Each row has a unique ID, but may fall into zero, one, or more categories, for example:
|-------+-------|
| name | cats |
|-------+-------|
| xyzzy | a b c |
| plugh | b |
| quux | |
| quuux | a c |
|-------+-------|
I'd like to obtain counts of how many items are in each category. In other words, output like this:
|------------+-------|
| categories | total |
|------------+-------|
| a | 2 |
| b | 2 |
| c | 2 |
| none | 1 |
|------------+-------|
I tried to use the case statement like this:
select case
when cats like "%a%" then 'a'
when cats like "%b%" then 'b'
when cats like "%c%" then 'c'
else 'none'
end as categories,
count(*)
from test
group by categories
But the problem is this only counts each row once, so it can't handle multiple categories. You then get this output instead:
|------------+-------|
| categories | total |
|------------+-------|
| a | 2 |
| b | 1 |
| none | 1 |
|------------+-------|
One possibility is to use as many union statements as you have categories:
select case
when cats like "%a%" then 'a'
end as categories, count(*)
from test
group by categories
union
select case
when cats like "%b%" then 'b'
end as categories, count(*)
from test
group by categories
union
...
but this seems really ugly and the opposite of DRY.
Is there a better way?
Fix your data structure! You should have a table with one row per name and per category:
create table nameCategories (
name varchar(255),
category varchar(255)
);
Then your query would be easy:
select category, count(*)
from namecategories
group by category;
Why is your data structure bad? Here are some reasons:
A column should contain a single value.
SQL has pretty lousy string functionality.
SQL queries to do what you want cannot be optimized.
SQL has a great data structure for storing lists. It is called a table, not a string.
With that in mind, here is one brute force method for doing what you want:
with categories as (
select 'a' as category union all
select 'b' union all
. . .
)
select c.category, count(t.category)
from categories c left join
test t
on ' ' || t.categories || ' ' like '% ' || c.category || ' %'
group by c.category;
If you already have a table of valid categories, then the CTE is not needed.

Reuse a query to use it for operation on LIMIT and OFFSET on postgresql

Using PostgreSQL 9.4, I have a table like this:
CREATE TABLE products
AS
SELECT id::uuid, title, kind, created_at
FROM ( VALUES
( '61c5292d-41f3-4e86-861a-dfb5d8225c8e', 'foo', 'standard' , '2017/04/01' ),
( 'def1d3f9-3e55-4d1b-9b42-610d5a46631a', 'bar', 'standard' , '2017/04/02' ),
( 'cc1982ab-c3ee-4196-be01-c53e81b53854', 'qwe', 'standard' , '2017/04/03' ),
( '919c03b5-5508-4a01-a97b-da9de0501f46', 'wqe', 'standard' , '2017/04/04' ),
( 'b3d081a3-dd7c-457f-987e-5128fb93ce13', 'tyu', 'other' , '2017/04/05' ),
( 'c6e9e647-e1b4-4f04-b48a-a4229a09eb64', 'ert', 'irregular', '2017/04/06' )
) AS t(id,title,kind,created_at);
Need to split the data into n same size parts. if this table had a regular id will be easier, but since it has uuid then I can't use modulo operations (as far as I know).
So far I did this:
SELECT * FROM products
WHERE kind = 'standard'
ORDER BY created_at
LIMIT(
SELECT count(*)
FROM products
WHERE kind = 'standard'
)/2
OFFSET(
(
SELECT count(*)
FROM products
WHERE kind = 'standard'
)/2
)*1;
Works fine but doing the same query 3 times I don't think is a good idea, the count is not "expensive" but every time someone wants to modify/update the query will need to do it in the 3 sections.
Note that currently n is set as 2 and the offset is set to 1 but both can take other values. Also limit rounds down so there may be a missing value, I can fix it using other means but having it on the query will be nice.
You can see the example here
Just to dispel a myth you can never use an serial and modulus to get parts because a serial isn't guaranteed to be gapless. You can use row_number() though.
SELECT row_number() OVER () % 3 AS parts, * FROM products;
parts | id | title | kind | created_at
-------+--------------------------------------+-------+-----------+------------
1 | 61c5292d-41f3-4e86-861a-dfb5d8225c8e | foo | standard | 2017/04/01
2 | def1d3f9-3e55-4d1b-9b42-610d5a46631a | bar | standard | 2017/04/02
0 | cc1982ab-c3ee-4196-be01-c53e81b53854 | qwe | standard | 2017/04/03
1 | 919c03b5-5508-4a01-a97b-da9de0501f46 | wqe | standard | 2017/04/04
2 | b3d081a3-dd7c-457f-987e-5128fb93ce13 | tyu | other | 2017/04/05
0 | c6e9e647-e1b4-4f04-b48a-a4229a09eb64 | ert | irregular | 2017/04/06
(6 rows)
This won't get equal parts unless parts goes into count equally.

Search an SQL table that already contains wildcards?

I have a table that contains patters for phone numbers, where x can match any digit.
+----+--------------+----------------------+
| ID | phone_number | phone_number_type_id |
+----+--------------+----------------------+
| 1 | 1234x000x | 1 |
| 2 | 87654311100x | 4 |
| 3 | x111x222x | 6 |
+----+--------------+----------------------+
Now, I might have 511132228 which will match with row 3 and it should return its type. So, it's kind of like SQL wilcards, but the other way around and I'm confused on how to achieve this.
Give this a go:
select * from my_table
where '511132228' like replace(phone_number, 'x', '_')
select *
from yourtable
where '511132228' like (replace(phone_number, 'x','_'))
Try below query:
SELECT ID,phone_number,phone_number_type_id
FROM TableName
WHERE '511132228' LIKE REPLACE(phone_number,'x','_');
Query with test data:
With TableName as
(
SELECT 3 ID, 'x111x222x' phone_number, 6 phone_number_type_id from dual
)
SELECT 'true' value_available
FROM TableName
WHERE '511132228' LIKE REPLACE(phone_number,'x','_');
The above query will return data if pattern match is available and will not return any row if no match is available.