Related
There is some table T1 (in the Oracle database) with some fields A, B, C, D, E, F:
Upd 0: Let the types of the above fields be the same.
Suppose, we need to group our table by the following rule:
A & B & (C | D)
Upd 1:
The A & B & (C | D) expression can be transformed to the following expression:
(A & B & C) | (A & B & D).
Thus, to solve this task I have to union two grouping queries for groups A, B, C and A, B, D:
select A, B, C, count(*)
from T1
group by A, B, C
union all
select A, B, D, count(*)
from T1
group by A, B, D
If the grouping rule will be more complicated: A & B & (C | D) & (E | F), then the solution will be more bulky, because I have to union grouping queries for the following groups:
A & B & C & E, A & B & D & E, A & B & C & F, A & B & D & F.
Is there any possibility to optimize such solution?
Or may be there is a better way to solve such tasks?
Upd 2:
I used short form of expressions A & B & (C | D) and A & B & (C | D) & (E | F) to emphasize that they have a common part A & B. And I don't want it to be calculated many times.
The GROUPING SETS clause can simplify the code and improve the performance of multiple grouping combinations.
Simpler Code
For an example, let's start with a simple table:
create table t1(a number, b number, c number, d number);
insert into t1
select 0,0,0,0 from dual union all
select 1,0,0,0 from dual union all
select 0,1,0,0 from dual union all
select 1,1,0,0 from dual union all
select 0,0,1,0 from dual union all
select 1,0,1,0 from dual union all
select 0,1,1,0 from dual union all
select 1,1,1,0 from dual union all
select 0,0,0,1 from dual union all
select 1,0,0,1 from dual union all
select 0,1,0,1 from dual union all
select 1,1,0,1 from dual union all
select 0,0,1,1 from dual union all
select 1,0,1,1 from dual union all
select 0,1,1,1 from dual union all
select 1,1,1,1 from dual;
The below query represents grouping by "A & (B | C)". (Unlike your example, I'm going to include some empty columns to demonstrate how the grouping works.)
select a, b, null c, count(*)
from t1
group by a, b
union all
select a, null b, c, count(*)
from t1
group by a, c;
A B C COUNT(*)
- - - --------
1 0 4
0 0 4
1 1 4
0 1 4
1 0 4
0 0 4
1 1 4
0 1 4
Re-writing with GROUPING SETS creates the same results as the preceding query:
select a, b, c, count(*)
from t1
group by grouping sets((a, b), (a, c));
Better Performance
Running the above queries using explain plan for ... and then select * from table(dbms_xplan.display(format => 'basic')); returns the following execution plans.
For the UNION ALL version:
------------------------------------
| Id | Operation | Name |
------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | UNION-ALL | |
| 2 | HASH GROUP BY | |
| 3 | TABLE ACCESS FULL| T1 |
| 4 | HASH GROUP BY | |
| 5 | TABLE ACCESS FULL| T1 |
------------------------------------
For the GROUPING SETS version:
-------------------------------------------------------------------------------
| Id | Operation | Name |
-------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | TEMP TABLE TRANSFORMATION | |
| 2 | LOAD AS SELECT (CURSOR DURATION MEMORY)| SYS_TEMP_0FD9D6787_464CF95 |
| 3 | TABLE ACCESS FULL | T1 |
| 4 | LOAD AS SELECT (CURSOR DURATION MEMORY)| SYS_TEMP_0FD9D6788_464CF95 |
| 5 | HASH GROUP BY | |
| 6 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6787_464CF95 |
| 7 | LOAD AS SELECT (CURSOR DURATION MEMORY)| SYS_TEMP_0FD9D6788_464CF95 |
| 8 | HASH GROUP BY | |
| 9 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6787_464CF95 |
| 10 | VIEW | |
| 11 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6788_464CF95 |
-------------------------------------------------------------------------------
The UNION ALL execution plan reads from the source table once for each different grouping. The GROUPING SETS execution plan only reads from the source table once, stores information in a temporary table, and then reads from that temporary table.
If the query only uses a small subset of the rows, or only a small subset of the columns, the GROUPING SETS plan could be significantly faster since it only has to read the full data once.
How can we create all the combinations of any length for the values in one column and return the distinct count of another column for that combination?
Table:
+------+--------+
| Type | Name |
+------+--------+
| A | Tom |
| A | Ben |
| B | Ben |
| B | Justin |
| C | Ben |
+------+--------+
Output Table:
+-------------+-------+
| Combination | Count |
+-------------+-------+
| A | 2 |
| B | 2 |
| C | 1 |
| AB | 3 |
| BC | 2 |
| AC | 2 |
| ABC | 3 |
+-------------+-------+
When the combination is only A, there are Tom and Ben so it's 2.
When the combination is only B, 2 distinct names so it's 2.
When the combination is A and B, 3 distinct names: Tom, Ben, Justin so it's 3.
I'm working in Amazon Redshift. Thank you!
NOTE: This answers the original version of the question which was tagged Postgres.
You can generate all combinations with this code
with recursive td as (
select distinct type
from t
),
cte as (
select td.type, td.type as lasttype, 1 as len
from td
union all
select cte.type || t.type, t.type as lasttype, cte.len + 1
from cte join
t
on 1=1 and t.type > cte.lasttype
)
You can then use this in a join:
with recursive t as (
select *
from (values ('a'), ('b'), ('c'), ('d')) v(c)
),
cte as (
select t.c, t.c as lastc, 1 as len
from t
union all
select cte.type || t.type, t.type as lasttype, cte.len + 1
from cte join
t
on 1=1 and t.type > cte.lasttype
)
select type, count(*)
from (select name, cte.type, count(*)
from cte join
t
on cte.type like '%' || t.type || '%'
group by name, cte.type
having count(*) = length(cte.type)
) x
group by type
order by type;
There is no way to generate all possible combinations (A, B, C, AB, AC, BC, etc) in Amazon Redshift.
(Well, you could select each unique value, smoosh them into one string, send it to a User-Defined Function, extract the result into multiple rows and then join it against a big query, but that really isn't something you'd like to attempt.)
One approach would be to create a table containing all possible combinations — you'd need to write a little program to do that (eg using itertools in Python). Then, you could join the data against that reasonably easy to get the desired result (eg IF 'ABC' CONTAINS '%A%').
I have a concern that doesn't let me go for already several days.
The collective mind is the last resort I may rely on.
Assume we have a table with two columns. Actually, the values are GUIDs, but for the sake of simplicity let's take them as letters.
| a | b |
|---|---|
| x | y |
| y | x |
| y | z |
| z | y |
| m | n |
| m | z |
I need to create a T-SQL query that will present all the possible pairs out of trasitivity, i.e. if x=y, y=z, then x=z. Also, simmetry has to be there, i.e. if there is x=y, then there should be y=x as well.
In this particular case, I believe there is "full house", meaning that every letter is connected to all others through the intermediates. But I need a query that will show that.
All I did is here (SQLFiddle fails to run it):
WITH
t AS
(SELECT 'x' AS a, 'y' AS b
UNION ALL
SELECT 'y' AS a, 'x' AS b
UNION ALL
SELECT 'y' AS a, 'z' AS b
UNION ALL
SELECT 'z' AS a, 'y' AS b
UNION ALL
SELECT 'm' AS a, 'n' AS b
UNION ALL
SELECT 'm' AS a, 'z' AS b),
coupled_reflective AS --for reflective couples we take either of them
(SELECT t2.a, t2.b
FROM t t1
JOIN t t2 ON t1.a=t2.b
AND t1.b!=t2.a),
reversive_coupled_reflective AS --that's another half of the above couples (reversed)
(SELECT t2.b, t2.a
FROM t t1
JOIN t t2 ON t1.a=t2.b
AND t1.b!=t2.a),
rs AS -- reduce the initial set (t)
(SELECT *
FROM coupled_reflective
UNION
SELECT *
FROM t
EXCEPT
SELECT *
FROM reversive_coupled_reflective),
cte AS -- recursively iterate through the set to find transitive values (get linked by the left field)
(SELECT a, b
FROM rs
UNION ALL
SELECT rs.b, cte.b
FROM rs
JOIN cte ON rs.a=cte.a
AND rs.b!=cte.b),
cte2 AS -- recursively iterate through the set to find transitive values (get linked by the right field)
(SELECT a, b
FROM rs
UNION ALL
SELECT rs.a, cte.a
FROM rs
JOIN cte ON rs.b=cte.b
AND rs.a!=cte.a)
SELECT a, b FROM cte2
UNION
SELECT a, b FROM cte
UNION
SELECT a, b FROM t
UNION
SELECT b, a FROM t
But that doesn't do the trick, unfortunately.
The desired result should be
| a | b |
|---|---|
| x | y |
| y | x |
| y | z |
| z | y |
| m | n |
| m | z |
| n | m |
| z | m |
| x | z |
| z | x |
| x | m |
| m | x |
| x | n |
| n | x |
| y | m |
| m | y |
| y | n |
| n | y |
Is there a SQL-gifted buddy out there who can help me here, please?
Thanks.
You can use recursive CTEs, but you need a list of already visited nodes. You can implement that using a string:
with cte as (
select a, b, cast('{' + a + '}{' + b + '}' as varchar(max)) as visited
from t
union all
select cte.a, t.b,
(visited + '{' + t.b + '}')
from cte join
t
on cte.b = t.a
where cte.visited not like '%{' + t.b + '}%'
)
select distinct a, b
from cte;
Note:
The above follows the directed links in the graph. If you want undirected links, then include both:
with t as (
select a, b from yourtable
union
select b, a from yourtable
),
The rest of the logic follows using t.
I am trying to select several columns from a table where one of the columns is unique. The select statement looks something like this:
select a, distinct b, c, d
from mytable
The table looks something like this:
| a | b | c | d | e |...
|---|---|---|---|---|
| 1 | 2 | 3 | 4 | 5
| 1 | 2 | 3 | 4 | 6
| 2 | 5 | 7 | 1 | 9
| 7 | 3 | 8 | 6 | 4
| 7 | 3 | 8 | 6 | 7
So the query should return something like this:
| a | b | c | d |
|---|---|---|---|
| 1 | 2 | 3 | 4
| 2 | 5 | 7 | 1
| 7 | 3 | 8 | 6
I just want to remove all of the rows where b is duplicated.
EDIT: There seems to be some confusion about which row I want to be selected in the case of duplicate b values. I don't care because the a, c, and d should (but are not guaranteed to) be the same.
Try this
SELECT * FROM (SELECT ROW_NUMBER() OVER (PARTITION BY b ORDER BY a) NO
,* FROM TableName) AS T1 WHERE NO = 1
I think you are nearly there with DISTINCT try:
SELECT DISTINCT a, b, c, d
FROM myTable
You haven't said how to pick a row for each b value, but this will pick one for each.
Select
a,
b,
c,
d,
e
From (
Select
a,
b,
c,
d,
e,
row_number() over (partition by b order by b) rn
From
mytable
) x
Where
x.rn = 1
If you don't care what values you get for B, C, D, and E, as long as they're appropriate for that key, you can group by A:
SELECT A, MIN(B), MIN(C), MIN(D), MIN(E)
FROM MyTable
GROUP BY A
Note that MAX() would be just as valid. Some RDBMSs support a FIRST() aggregate, or similar, for exactly these circumstances where you don't care which value you get (from a certain population).
This will return what you're looking for but I think your example is flawed because you've no determinism over which value from the e column is returned.
Create Table A1 (a int, b int, c int, d int, e int)
INSERT INTO A1 (a,b,c,d,e) VALUES (1,2,3,4,5)
INSERT INTO A1 (a,b,c,d,e) VALUES (1,2,3,4,6)
INSERT INTO A1 (a,b,c,d,e) VALUES (2,5,7,1,9)
INSERT INTO A1 (a,b,c,d,e) VALUES (7,3,8,6,4)
INSERT INTO A1 (a,b,c,d,e) VALUES (7,3,8,6,7)
SELECT * FROM A1
SELECT a,b,c,d
FROM
(
SELECT ROW_NUMBER() OVER (PARTITION BY b ORDER BY a) RowNum ,*
FROM A1
) As InnerQuery WHERE RowNum = 1
You cannot put DISTINCT on a single column. You should put it right after the SELECT:
SELECT DISTINCT a, b, c, d
FROM mytable
It return the result you need for your sample table. However if you require to remove duplicates only from a single column (which is not possible) you probably misunderstood something. Give us more descriptions and sample, and we try to guide you to the right direction.
Is it possible to get counts, too? My UNIONs get me all distinct values across all 4 columns but now I need to know how many times each value appears across all 4 columns. Need to stay with stock SQL, if possible.
(SELECT DISTINCT classify1 AS classified FROM class) UNION
(SELECT DISTINCT classify2 AS classified FROM class) UNION
(SELECT DISTINCT classify3 AS classified FROM class) UNION
(SELECT DISTINCT classify4 AS classified FROM class)
ORDER BY classified
Returns:
A
B
C
D
E
F
H
Need:
A | 3
B | 3
C | 4
D | 3
E | 1
F | 1
H | 1
SQL Fiddle
SELECT a.classified, COUNT(*)
FROM
(
(SELECT classify1 AS classified FROM class) UNION ALL
(SELECT classify2 AS classified FROM class) UNION ALL
(SELECT classify3 AS classified FROM class) UNION ALL
(SELECT classify4 AS classified FROM class)) a
GROUP BY a.classified
Result
| CLASSIFIED | COLUMN_1 |
-------------------------
| A | 3 |
| B | 3 |
| C | 4 |
| D | 3 |
| E | 1 |
| F | 1 |
| H | 1 |
When you use DISTINCT you eliminate the extra 'A' in classify3
Use UNION ALL instead of UNION, the embed the result in a sub-query to perform your aggregate on.
SELECT
classified,
COUNT(*)
FROM
(
(SELECT classify1 AS classified FROM class) UNION ALL
(SELECT classify2 AS classified FROM class) UNION ALL
(SELECT classify3 AS classified FROM class) UNION ALL
(SELECT classify4 AS classified FROM class)
)
AS unified_data
GROUP BY
classified
ORDER BY
classified