Postgresql - multiple select condition - sql

I have table with following structure:
|id|author_id|name|type|created_at|updated_at
As a type I can have 5 different types, (A, B, C, D, E).
I need a query DB by author_id where I can select only last updated row "type" A and B. And select all other type rows.
So the result should be something like:
| id | author_id | name | type | created_at | updated_at
| 12 | 88 | lorem | A
| 45 | 88 | lorem | B
| 44 | 88 | lorem | C
| 154 | 88 | lorem | C
| 98 | 88 | lorem | C
| 856 | 88 | lorem | E
| 857 | 88 | lorem | E
Is it possible with single query? Or I need to use two queries?
Thank you

You may try the following:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY "type" ORDER BY updated_at DESC) rn
FROM yourTable
)
SELECT id, author_id, name, "type", created_at, updated_at
FROM cte
WHERE
("type" IN ('A', 'B') AND rn = 1) OR
"type" NOT IN ('A', 'B');
This approach uses ROW_NUMBER to find the latest rows for all types. In the query on the CTE, we select only the most recently updated rows for types A and B, but we select all rows for all other types.

Assuming that id is a unique key in the table, you could do this with distinct on:
select distinct on(case when type in ('A', 'B') then type else id::text end) t.*
from mytable t
order by case when type in ('A', 'B') then type else id::text end, created_at desc, id
This uses a conditional expression as distinct on key, that returns either type if it is A or B, or the id for other values. So you get the top 1 value for types A and B, and all other values for other types.

I am thinking:
(select distinct on (type) t.*
from t
where type in ('A', 'B')
order by type, created_at desc
) union all
select t.*
from t
where type not in ('A', 'B');
In particular, this can make good use of an index on (type, created_at desc).

Related

How to pivot array build from REGEXP_EXTRACT_ALL

I'm collecting url with query parameters in a BigQuery table. I want to parse these urls and then pivot the table. Input data and expected Output at the end.
I found two queries that I want to merge.
This one to pivot my parsed url:
select id,
max(case when test.name='a' then test.score end) as a,
max(case when test.name='b' then test.score end) as b,
max(case when test.name='c' then test.score end) as c
from
(
select a.id, t
from `table` as a,
unnest(test) as t
)A group by id
then I have this query to parse the url:
WITH examples AS (
SELECT 1 AS id,
'?foo=bar' AS query,
'simple' AS description
UNION ALL SELECT 2, '?foo=bar&bar=baz', 'multiple params'
UNION ALL SELECT 3, '?foo[]=bar&foo[]=baz', 'arrays'
UNION ALL SELECT 4, '', 'no query'
)
SELECT
id,
query,
REGEXP_EXTRACT_ALL(query,r'(?:\?|&)((?:[^=]+)=(?:[^&]*))') as params,
REGEXP_EXTRACT_ALL(query,r'(?:\?|&)(?:([^=]+)=(?:[^&]*))') as keys,
REGEXP_EXTRACT_ALL(query,r'(?:\?|&)(?:(?:[^=]+)=([^&]*))') as values,
description
FROM examples
I'm not sure to explain my issues. But I think that is because when I'm splitting my query parameters as separate columns It doesn't match with the format of the first query where I need to merge the key and values under the same column so I can unnest them correctly.
Input data:
| id | url |
|---- |-------------------- |
| 1 | url/?foo=aaa&bar=ccc |
| 2 | url/?foo=bbb&bar=ccc |
expected output:
| id | foo | bar |
|---- |---- |---- |
| 1 | aaa | ccc |
| 2 | bbb | ccc |
I have exactly the same number of parameters
Use below
select id,
max(if(split(kv, '=')[offset(0)] = 'foo', split(kv, '=')[offset(1)], null)) as foo,
max(if(split(kv, '=')[offset(0)] = 'bar', split(kv, '=')[offset(1)], null)) as bar
from `project.dataset.table` t,
unnest(regexp_extract_all(url, r'[?&](\w+=\w+)')) kv
group by id
if applied to sample data in your question - output is

Finding only rows with non-duplicated values within a window partition

I want to look at why some descriptions are different for the same permit id. Here's the table (I'm using Snowflake):
create or replace table permits (permit varchar(255), description varchar(255));
// dupe permits, dupe descriptions, throw out
INSERT INTO permits VALUES ('1', 'abc');
INSERT INTO permits VALUES ('1', 'abc');
// dupe permits, unique descriptions, keep
INSERT INTO permits VALUES ('2', 'def1');
INSERT INTO permits VALUES ('2', 'def2');
INSERT INTO permits VALUES ('2', 'def3');
// dupe permits, unique descriptions, keep
INSERT INTO permits VALUES ('3', NULL);
INSERT INTO permits VALUES ('3', 'ghi1');
// unique permit, throw out
INSERT INTO permits VALUES ('5', 'xyz');
What I want is to query this table and get out only the sets of rows that have duplicate permit ids but different descriptions.
The output I want is this:
+---------+-------------+
| PERMIT | DESCRIPTION |
+---------+-------------+
| 2 | def1 |
| 2 | def2 |
| 2 | def3 |
| 3 | |
| 3 | ghi1 |
+---------+-------------+
I've tried this:
with with_dupe_counts as (
select
count(permit) over (partition by permit order by permit) as permit_dupecount,
count(description) over (partition by permit order by permit) as description_dupecount,
permit,
description
from permits
)
select *
from with_dupe_counts
where permit_dupecount > 1
and description_dupecount > 1
Which gives me permits 1 and 2 and counts descriptions whether they are unique or not:
+------------------+-----------------------+--------+-------------+
| PERMIT_DUPECOUNT | DESCRIPTION_DUPECOUNT | PERMIT | DESCRIPTION |
+------------------+-----------------------+--------+-------------+
| 2 | 2 | 1 | abc |
| 2 | 2 | 1 | abc |
| 3 | 3 | 2 | def1 |
| 3 | 3 | 2 | def2 |
| 3 | 3 | 2 | def3 |
+------------------+-----------------------+--------+-------------+
What I think would work would be
count(unique description) over (partition by permit order by permit) as description_dupecount
But as I'm realizing there are lots of things that don't work in window functions. This question isn't necessarily "how do I get count(unique x) to work in a window function" because I don't know if that is the best way to solve this.
A simple group by I don't think will work because I want to get the original rows back.
One method uses min() and max() and count():
select *
from (select p.*,
min(description) over (partition by permit) as min_d,
max(description) over (partition by permit) as max_d,
count(description) over (partition by permit) as cnt_d,
count(*) over (partition by permit) as cnt,
count(permit) over (partition by permit order by permit) as permit_dupecount
from permits
)
where min_d <> max_d or cnt_d <> cnt;
I would just use exists:
select p.*
from permits p
where exists (
select 1
from permits p1
where p1.permit = p.permit and p1.description <> p.description
)
To handle the null values, we can use standard null-safe equality operator IS DISTINCT FROM, which Snowlake supports:
select p.*
from permits p
where exists (
select 1
from permits p1
where
p1.permit = p.permit
and p1.description is distinct from p.description
)
Should work
SELECT DISTINCT p1.permit, p1.description
FROM permits p1
JOIN permits p2 ON p1.permit = p2.permit
WHERE p1.description != p2.description OR p1.description IS NULL AND p2.description IS NOT NULL
This is my go to:
with x as (
select permit, count(distinct description) cnt
from permits p1
group by permit
having cnt > 1
)
select p.*
from x
join permits p
on x.permit = p.permit;

Redshift create all the combinations of any length for the values in one column

How can we create all the combinations of any length for the values in one column and return the distinct count of another column for that combination?
Table:
+------+--------+
| Type | Name |
+------+--------+
| A | Tom |
| A | Ben |
| B | Ben |
| B | Justin |
| C | Ben |
+------+--------+
Output Table:
+-------------+-------+
| Combination | Count |
+-------------+-------+
| A | 2 |
| B | 2 |
| C | 1 |
| AB | 3 |
| BC | 2 |
| AC | 2 |
| ABC | 3 |
+-------------+-------+
When the combination is only A, there are Tom and Ben so it's 2.
When the combination is only B, 2 distinct names so it's 2.
When the combination is A and B, 3 distinct names: Tom, Ben, Justin so it's 3.
I'm working in Amazon Redshift. Thank you!
NOTE: This answers the original version of the question which was tagged Postgres.
You can generate all combinations with this code
with recursive td as (
select distinct type
from t
),
cte as (
select td.type, td.type as lasttype, 1 as len
from td
union all
select cte.type || t.type, t.type as lasttype, cte.len + 1
from cte join
t
on 1=1 and t.type > cte.lasttype
)
You can then use this in a join:
with recursive t as (
select *
from (values ('a'), ('b'), ('c'), ('d')) v(c)
),
cte as (
select t.c, t.c as lastc, 1 as len
from t
union all
select cte.type || t.type, t.type as lasttype, cte.len + 1
from cte join
t
on 1=1 and t.type > cte.lasttype
)
select type, count(*)
from (select name, cte.type, count(*)
from cte join
t
on cte.type like '%' || t.type || '%'
group by name, cte.type
having count(*) = length(cte.type)
) x
group by type
order by type;
There is no way to generate all possible combinations (A, B, C, AB, AC, BC, etc) in Amazon Redshift.
(Well, you could select each unique value, smoosh them into one string, send it to a User-Defined Function, extract the result into multiple rows and then join it against a big query, but that really isn't something you'd like to attempt.)
One approach would be to create a table containing all possible combinations — you'd need to write a little program to do that (eg using itertools in Python). Then, you could join the data against that reasonably easy to get the desired result (eg IF 'ABC' CONTAINS '%A%').

SQL Select a group when attributes match at least a list of values

Given a table with a (non-distinct) identifier and a value:
| ID | Value |
|----|-------|
| 1 | A |
| 1 | B |
| 1 | C |
| 1 | D |
| 2 | A |
| 2 | B |
| 2 | C |
| 3 | A |
| 3 | B |
How can you select the grouped identifiers, which have values for a given list? (e.g. ('B', 'C'))
This list might also be the result of another query (like SELECT Value from Table1 WHERE ID = '2' to find all IDs which have a superset of values, compared to ID=2 (only ID=1 in this example))
Result
| ID |
|----|
| 1 |
| 2 |
1 and 2 are part of the result, as they have both A and B in their Value-column. 3 is not included, as it is missing C
Thanks to the answer from this question: SQL Select only rows where exact multiple relationships exist I created a query which works for a fixed list. However I need to be able to use the results of another query without changing the query. (And also requires the Access-specific IFF function):
SELECT ID FROM Table1
GROUP BY ID
HAVING SUM(Value NOT IN ('A', 'B')) = 0
AND SUM(IIF(Value='A', 1, 0)) = 1
AND SUM(IIF(Value='B', 1, 0)) = 1
In case it matters: The SQL is run on a Excel-table via VBA and ADODB.
In the where criteria filter on the list of values you would like to see, group by id and in the having clause filter on those ids which have 3 matching rows.
select id from table1
where value in ('A', 'B', 'C') --you can use a result of another query here
group by id
having count(*)=3
If you can have the same id - value pair more than once, then you need to slightly alter the having clause: having count(distinct value)=3
If you want to make it completely dynamic based on a subquery, then:
select id, min(valcount) as minvalcount from table1
cross join (select count(*) as valcount from table1 where id=2) as t1
where value in (select value from table1 where id=2) --you can use a result of another query here
group by id
having count(*)=minvalcount

Select the most common item for each category

Each row in my table belongs to some category, has some value and other data.
I would like to select each category with the most common value for it (doesn't matter which one if there are multiple), ordered by category.
some_table: expected result:
+--------+-----+--- +--------+-----+
|category|value|... |category|value|
+--------+-----+--- +--------+-----+
| 1 | a | | 1 | a |
| 1 | a | | 2 | b |
| 1 | b | | 3 | a # or b
| 2 | a | +--------+-----+
| 2 | b |
| 2 | c |
| 2 | b |
| 3 | a |
| 3 | a |
| 3 | b |
| 3 | b |
+--------+-----+---
I have a solution (posting it as an answer) but it seems suboptimal to me. So I'm looking for better solutions.
My table will have up to 10000 rows (possibly, but not likely, beyond that).
I'm planning to use SQLite but I'm not tied to it, so I may reconsider if SQLite can't do this with reasonable performance.
I would be inclined to do this using a correlated subquery:
select distinct category,
(select value
from some_table t2
where t2.category = t.category
group by value
order by count(*) desc
limit 1
) as mode_value
from some_table t;
The name for the most common value is "mode" in statistics.
And, if you had a categories table, this would be written as:
select category,
(select value
from some_table t2
where t2.category = c.category
group by value
order by count(*) desc
limit 1
) as mode_value
from categories c;
Here is one option, but I think it's slow...
SELECT DISTINCT `category` AS `the_category`, `value`
FROM `some_table`
WHERE `value`=(
SELECT `value`
FROM `some_table`
WHERE `category`=`the_category`
GROUP BY `value`
ORDER BY COUNT(`value`) DESC LIMIT 1)
ORDER BY `category`;
You can replace a part of this with WHERE `id`=( SELECT `id` if the table has a unique/primary key column, then the LIMIT 1 is not needed.
select category, value, count(*) value_count
from some_table t
group by category, value
order by category, value_count DESC;
returns us amout of each value in each category
select category, value
from (
select category, value, count(*) value_count
from some_table t
group by category, value) sub
group by category
actually we need the first value because it's sorted.
I am not sure sqlite leaves the first one and can't test but IMHO it should work