Optimization of GROUP BY which contains the grouping rules with OR condition - sql

There is some table T1 (in the Oracle database) with some fields A, B, C, D, E, F:
Upd 0: Let the types of the above fields be the same.
Suppose, we need to group our table by the following rule:
A & B & (C | D)
Upd 1:
The A & B & (C | D) expression can be transformed to the following expression:
(A & B & C) | (A & B & D).
Thus, to solve this task I have to union two grouping queries for groups A, B, C and A, B, D:
select A, B, C, count(*)
from T1
group by A, B, C
union all
select A, B, D, count(*)
from T1
group by A, B, D
If the grouping rule will be more complicated: A & B & (C | D) & (E | F), then the solution will be more bulky, because I have to union grouping queries for the following groups:
A & B & C & E, A & B & D & E, A & B & C & F, A & B & D & F.
Is there any possibility to optimize such solution?
Or may be there is a better way to solve such tasks?
Upd 2:
I used short form of expressions A & B & (C | D) and A & B & (C | D) & (E | F) to emphasize that they have a common part A & B. And I don't want it to be calculated many times.

The GROUPING SETS clause can simplify the code and improve the performance of multiple grouping combinations.
Simpler Code
For an example, let's start with a simple table:
create table t1(a number, b number, c number, d number);
insert into t1
select 0,0,0,0 from dual union all
select 1,0,0,0 from dual union all
select 0,1,0,0 from dual union all
select 1,1,0,0 from dual union all
select 0,0,1,0 from dual union all
select 1,0,1,0 from dual union all
select 0,1,1,0 from dual union all
select 1,1,1,0 from dual union all
select 0,0,0,1 from dual union all
select 1,0,0,1 from dual union all
select 0,1,0,1 from dual union all
select 1,1,0,1 from dual union all
select 0,0,1,1 from dual union all
select 1,0,1,1 from dual union all
select 0,1,1,1 from dual union all
select 1,1,1,1 from dual;
The below query represents grouping by "A & (B | C)". (Unlike your example, I'm going to include some empty columns to demonstrate how the grouping works.)
select a, b, null c, count(*)
from t1
group by a, b
union all
select a, null b, c, count(*)
from t1
group by a, c;
A B C COUNT(*)
- - - --------
1 0 4
0 0 4
1 1 4
0 1 4
1 0 4
0 0 4
1 1 4
0 1 4
Re-writing with GROUPING SETS creates the same results as the preceding query:
select a, b, c, count(*)
from t1
group by grouping sets((a, b), (a, c));
Better Performance
Running the above queries using explain plan for ... and then select * from table(dbms_xplan.display(format => 'basic')); returns the following execution plans.
For the UNION ALL version:
------------------------------------
| Id | Operation | Name |
------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | UNION-ALL | |
| 2 | HASH GROUP BY | |
| 3 | TABLE ACCESS FULL| T1 |
| 4 | HASH GROUP BY | |
| 5 | TABLE ACCESS FULL| T1 |
------------------------------------
For the GROUPING SETS version:
-------------------------------------------------------------------------------
| Id | Operation | Name |
-------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | TEMP TABLE TRANSFORMATION | |
| 2 | LOAD AS SELECT (CURSOR DURATION MEMORY)| SYS_TEMP_0FD9D6787_464CF95 |
| 3 | TABLE ACCESS FULL | T1 |
| 4 | LOAD AS SELECT (CURSOR DURATION MEMORY)| SYS_TEMP_0FD9D6788_464CF95 |
| 5 | HASH GROUP BY | |
| 6 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6787_464CF95 |
| 7 | LOAD AS SELECT (CURSOR DURATION MEMORY)| SYS_TEMP_0FD9D6788_464CF95 |
| 8 | HASH GROUP BY | |
| 9 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6787_464CF95 |
| 10 | VIEW | |
| 11 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6788_464CF95 |
-------------------------------------------------------------------------------
The UNION ALL execution plan reads from the source table once for each different grouping. The GROUPING SETS execution plan only reads from the source table once, stores information in a temporary table, and then reads from that temporary table.
If the query only uses a small subset of the rows, or only a small subset of the columns, the GROUPING SETS plan could be significantly faster since it only has to read the full data once.

Related

Select unique combinations (unique on both sides)

EDIT: added a link to Fiddle for a more comprehensive sample (actual dataset)
I wonder if the below is possible in SQL, in BigQuery in particular, and in one SELECT statement.
Consider following input:
Key | Value
-----|-------
a | 2
a | 3
b | 2
b | 3
b | 5
c | 2
c | 5
c | 7
Logic: select the lowest value "available" for each key. Available meaning not yet assigned/used. See below.
Key | Value | Rule
-----|-------|--------------------------------------------
a | 2 | keep
a | 3 | ignore because key "a" has a value already
b | 2 | ignore because value "2" was already used
b | 3 | keep
b | 5 | ignore because key "b" has a value already
c | 2 | ignore because value "2" was already used
c | 5 | keep
c | 7 | ignore because key "c" has a value already
Hence expected outcome:
Key | Value
-----|-------
a | 2
b | 3
c | 5
Here the SQL to create the dummy table:
with t as ( select
'a' key, 2 value UNION ALL select 'a', 3
UNION ALL select 'b', 2 UNION ALL select 'b', 3 UNION ALL select 'b', 5
UNION ALL select 'c', 2 UNION ALL select 'c', 5 UNION ALL select 'c', 7
)
select * from t
EDIT: here another dataset
Not sure what combination of FULL JOIN, DISTINCT, ARRAY or WINDOW functions I can use.
Any guidance is appreciated.
EDIT: This is an incorrect answer that worked with the original example dataset, but has issues (as seen with comprehensive sample). I'm leaving it here for now to maintain comment history.
I don't have a specific BigQuery answer, but here is one SQL solution using a Common Table Expression and recursion.
WITH MyCTE AS
(
/* ANCHOR SUBQUERY */
SELECT MyKey, MyValue
FROM MyTable t
WHERE t.MyKey = (SELECT MIN(MyKey) FROM MyTable)
UNION ALL
/* RECURSIVE SUBQUERY */
SELECT t.MyKey, t.MyValue
FROM MyTable t
INNER JOIN MyCTE c
ON c.MyKey < t.MyKey
AND c.MyValue < t.MyValue
)
SELECT MyKey, MIN(MyValue)
FROM MyCTE
GROUP BY MyKey
;
Results:
Key | Value
-----|-------
a | 2
b | 3
c | 5
SQL Fiddle

SQL/HIVE - How do I query from a horizontal output to vertical output?

I have a query that returns two rows with the information needed
SELECT src_file_dt, a, b ,c FROM my_table WHERE src_file_dt IN ('1531675040', '1531675169');
it will return:
src_file_dt | a | b | c
1531675040 | 2 | 6 | 9
1531675169 | 8 | 2 | 0
Now, I need the data in the following layout, how do I get it like this output:
fields | prev (1531675040) | curr (1531675169)
a | 2 | 8
b | 6 | 2
c | 9 | 0
There should be easier way to achieve that.
I've build results using explode function and selecting separately data for prev and next columns:
select t1.keys, t1.vals as prev, t2.vals as next from (
SELECT explode(map('a', a, 'b', b, 'c',c)) as (keys, vals)
FROM my_table
WHERE src_file_dt = '1531675040'
) t1,
(
SELECT explode(map('a', a, 'b', b, 'c',c)) as (keys, vals)
FROM my_table
WHERE src_file_dt = '1531675169'
) t2
where t1.keys = t2.keys
;

Group by 3 columns: "Each group by expression must contain at least one column that is not an outer reference"

I know questions regarding this error message have been asked already, but I couldn't find any that really fit my problem.
I have a table with three columns (A,B,C) containing different values and I need to identify all the identical combination. For example out of "TABLE A" below:
| A | B | C |
| 1 | 2 | 3 |
| 1 | 3 | 3 |
| 1 | 2 | 3 |
| 2 | 2 | 2 |
| 1 | 3 | 3 |
... I would like too get "TABLE B" below:
| A | B | C | count |
| 1 | 2 | 3 | 1 |
| 1 | 3 | 3 | 1 |
| 2 | 2 | 2 | 1 |
(I need the last column "count" with 1 in each row for later usage)
When I try with "group by A,B,C" I get the error mentioned in the title. Any help would be greatly appreciated!
FYI, I don't think it really changes the matter, but "TABLE A" is obtained from an other table: "SOURCE_TABLE", thanks to a query of the type:
select (case when ... ),(case when ...),(case when ...) from SOURCE_TABLE
and I need to build "TABLE B" with only one query.
i think what you are after of is using distinct
select distinct A,B,C, 1 [count] -- where 1 is a static value for later use
from (select ... from sourcetable) X
Sounds like you have the right idea. My guess is that the error is occurring due to an outer reference in your CASE statements. If you wrapped your first query in another query, it may alleviate this issue. Try:
SELECT A, B, C, COUNT(*) AS [UniqueRowCount]
FROM (
SELECT (case when ... ) AS A, (case when ...) AS B, (case when ...) AS C FROM SOURCE_TABLE
) AS Subquery
GROUP BY A, B, C
After re-reading your question, it seems that you're not counting at all, just putting a "1" after each distinct row. If that's the case, then you can try:
SELECT DISTINCT A, B, C, [Count]
FROM (
SELECT (case when ... ) AS A, (case when ...) AS B, (case when ...) AS C, 1 AS [Count] FROM SOURCE_TABLE
) AS Subquery
Assuming your outer reference exceptions were occurring in only your aggregations, you should also simply try:
SELECT DISTINCT (case when ... ) AS A, (case when ...) AS B, (case when ...) AS C, 1 AS [Count] FROM SOURCE_TABLE

Find all possible pairs based on transitivity via T-SQL

I have a concern that doesn't let me go for already several days.
The collective mind is the last resort I may rely on.
Assume we have a table with two columns. Actually, the values are GUIDs, but for the sake of simplicity let's take them as letters.
| a | b |
|---|---|
| x | y |
| y | x |
| y | z |
| z | y |
| m | n |
| m | z |
I need to create a T-SQL query that will present all the possible pairs out of trasitivity, i.e. if x=y, y=z, then x=z. Also, simmetry has to be there, i.e. if there is x=y, then there should be y=x as well.
In this particular case, I believe there is "full house", meaning that every letter is connected to all others through the intermediates. But I need a query that will show that.
All I did is here (SQLFiddle fails to run it):
WITH
t AS
(SELECT 'x' AS a, 'y' AS b
UNION ALL
SELECT 'y' AS a, 'x' AS b
UNION ALL
SELECT 'y' AS a, 'z' AS b
UNION ALL
SELECT 'z' AS a, 'y' AS b
UNION ALL
SELECT 'm' AS a, 'n' AS b
UNION ALL
SELECT 'm' AS a, 'z' AS b),
coupled_reflective AS --for reflective couples we take either of them
(SELECT t2.a, t2.b
FROM t t1
JOIN t t2 ON t1.a=t2.b
AND t1.b!=t2.a),
reversive_coupled_reflective AS --that's another half of the above couples (reversed)
(SELECT t2.b, t2.a
FROM t t1
JOIN t t2 ON t1.a=t2.b
AND t1.b!=t2.a),
rs AS -- reduce the initial set (t)
(SELECT *
FROM coupled_reflective
UNION
SELECT *
FROM t
EXCEPT
SELECT *
FROM reversive_coupled_reflective),
cte AS -- recursively iterate through the set to find transitive values (get linked by the left field)
(SELECT a, b
FROM rs
UNION ALL
SELECT rs.b, cte.b
FROM rs
JOIN cte ON rs.a=cte.a
AND rs.b!=cte.b),
cte2 AS -- recursively iterate through the set to find transitive values (get linked by the right field)
(SELECT a, b
FROM rs
UNION ALL
SELECT rs.a, cte.a
FROM rs
JOIN cte ON rs.b=cte.b
AND rs.a!=cte.a)
SELECT a, b FROM cte2
UNION
SELECT a, b FROM cte
UNION
SELECT a, b FROM t
UNION
SELECT b, a FROM t
But that doesn't do the trick, unfortunately.
The desired result should be
| a | b |
|---|---|
| x | y |
| y | x |
| y | z |
| z | y |
| m | n |
| m | z |
| n | m |
| z | m |
| x | z |
| z | x |
| x | m |
| m | x |
| x | n |
| n | x |
| y | m |
| m | y |
| y | n |
| n | y |
Is there a SQL-gifted buddy out there who can help me here, please?
Thanks.
You can use recursive CTEs, but you need a list of already visited nodes. You can implement that using a string:
with cte as (
select a, b, cast('{' + a + '}{' + b + '}' as varchar(max)) as visited
from t
union all
select cte.a, t.b,
(visited + '{' + t.b + '}')
from cte join
t
on cte.b = t.a
where cte.visited not like '%{' + t.b + '}%'
)
select distinct a, b
from cte;
Note:
The above follows the directed links in the graph. If you want undirected links, then include both:
with t as (
select a, b from yourtable
union
select b, a from yourtable
),
The rest of the logic follows using t.

Trying to select multiple columns where one is unique

I am trying to select several columns from a table where one of the columns is unique. The select statement looks something like this:
select a, distinct b, c, d
from mytable
The table looks something like this:
| a | b | c | d | e |...
|---|---|---|---|---|
| 1 | 2 | 3 | 4 | 5
| 1 | 2 | 3 | 4 | 6
| 2 | 5 | 7 | 1 | 9
| 7 | 3 | 8 | 6 | 4
| 7 | 3 | 8 | 6 | 7
So the query should return something like this:
| a | b | c | d |
|---|---|---|---|
| 1 | 2 | 3 | 4
| 2 | 5 | 7 | 1
| 7 | 3 | 8 | 6
I just want to remove all of the rows where b is duplicated.
EDIT: There seems to be some confusion about which row I want to be selected in the case of duplicate b values. I don't care because the a, c, and d should (but are not guaranteed to) be the same.
Try this
SELECT * FROM (SELECT ROW_NUMBER() OVER (PARTITION BY b ORDER BY a) NO
,* FROM TableName) AS T1 WHERE NO = 1
I think you are nearly there with DISTINCT try:
SELECT DISTINCT a, b, c, d
FROM myTable
You haven't said how to pick a row for each b value, but this will pick one for each.
Select
a,
b,
c,
d,
e
From (
Select
a,
b,
c,
d,
e,
row_number() over (partition by b order by b) rn
From
mytable
) x
Where
x.rn = 1
If you don't care what values you get for B, C, D, and E, as long as they're appropriate for that key, you can group by A:
SELECT A, MIN(B), MIN(C), MIN(D), MIN(E)
FROM MyTable
GROUP BY A
Note that MAX() would be just as valid. Some RDBMSs support a FIRST() aggregate, or similar, for exactly these circumstances where you don't care which value you get (from a certain population).
This will return what you're looking for but I think your example is flawed because you've no determinism over which value from the e column is returned.
Create Table A1 (a int, b int, c int, d int, e int)
INSERT INTO A1 (a,b,c,d,e) VALUES (1,2,3,4,5)
INSERT INTO A1 (a,b,c,d,e) VALUES (1,2,3,4,6)
INSERT INTO A1 (a,b,c,d,e) VALUES (2,5,7,1,9)
INSERT INTO A1 (a,b,c,d,e) VALUES (7,3,8,6,4)
INSERT INTO A1 (a,b,c,d,e) VALUES (7,3,8,6,7)
SELECT * FROM A1
SELECT a,b,c,d
FROM
(
SELECT ROW_NUMBER() OVER (PARTITION BY b ORDER BY a) RowNum ,*
FROM A1
) As InnerQuery WHERE RowNum = 1
You cannot put DISTINCT on a single column. You should put it right after the SELECT:
SELECT DISTINCT a, b, c, d
FROM mytable
It return the result you need for your sample table. However if you require to remove duplicates only from a single column (which is not possible) you probably misunderstood something. Give us more descriptions and sample, and we try to guide you to the right direction.