Subquery in select - non-grouped values in 'IN' clause - sql

Assume the following simplified schema:
create table main_table
(
a number,
b number,
c number
);
create table other_table
(
c number,
d number
)
Now, what i want to achieve:
I have a query on main_table, that groups by a,b.
I need to use the "all values of c" in subquery in select clause to get some data from other tables.
I can't join to the other table unfortunately.
Pseudocode would be:
select mt.a,
mt.b,
(select /* some aggregated value */
from other_table ot
where ot.c in (all_values_of_c_within_group)
)
from main table mt
group by mt.a, mt.b
There are two ways i know it's possible to handle this:
Use join on other_table and then aggregate values from there - unfortunately i can't do it, because of how the real query is structured (3 nested views, 800 sloc, 30 values in group by - long story)
Use listagg and then 'delistagg' it with 'instr'. Pseudocode:
/*(...)*/
(select /* some_aggregated_value */
from other_table ot
where instr(',' || listagg(
to_char(mt.c), ',') within group (order by 1),
',' || ot.c) > 0
)
/*(...)*/
But that's just terrible code, and it automatically prevents using any potentially existing indexes on other_table.c.
Is there a syntax to properly get "all values of column within group?

It is unclear without some data and expected results what you are trying to achieve but I think you do what you want using collections:
SQL Fiddle
Oracle 11g R2 Schema Setup:
create table main_table( a, b, c ) AS
SELECT 1, 1, 1 FROM DUAL UNION ALL
SELECT 1, 1, 2 FROM DUAL UNION ALL
SELECT 1, 1, 3 FROM DUAL
/
create table other_table( c, d ) AS
SELECT 1, 4 FROM DUAL UNION ALL
SELECT 3, 6 FROM DUAL UNION ALL
SELECT 5, 8 FROM DUAL
/
CREATE TYPE number_table AS TABLE OF NUMBER
/
Query 1:
SELECT a,
b,
( SELECT LISTAGG( d, ',' ) WITHIN GROUP ( ORDER BY d )
FROM other_table
WHERE c MEMBER OF m.cs
) ds
FROM (
SELECT a,
b,
CAST( COLLECT( c ) AS number_table ) AS cs
FROM main_table
GROUP BY a, b
) m
Results:
| A | B | DS |
|---|---|-----|
| 1 | 1 | 4,6 |
Query 2: But it seems simpler to just use a LEFT OUTER JOIN:
SELECT a,
b,
LISTAGG( d, ',' ) WITHIN GROUP ( ORDER BY d ) ds
FROM main_table m
LEFT OUTER JOIN other_table o
ON ( m.c = o.c )
GROUP BY a, b
Results:
| A | B | DS |
|---|---|-----|
| 1 | 1 | 4,6 |

You may just be able to aggregate the subquery, e.g. with sum as the aggregate function:
select mt.a,
mt.b,
sum(
(select d
from other_table ot
where ot.c = mt.c)
) as sum_d
from main_table mt
group by mt.a, mt.b;
With some made-up data:
insert into main_table values (1, 2, 3);
insert into main_table values (1, 2, 4);
insert into main_table values (2, 3, 4);
insert into main_table values (2, 3, 5);
insert into main_table values (2, 3, 6);
insert into other_table values (3, 10);
insert into other_table values (4, 11);
insert into other_table values (5, 12);
insert into other_table values (6, 13);
that query gives:
A B SUM_D
---------- ---------- ----------
2 3 36
1 2 21
As you noted, with an extra row:
insert into main_table values (2, 3, 4);
that query counts a matching c's d value multiple times, so you get 47 instead of 36:
A B SUM_D
---------- ---------- ----------
2 3 47
1 2 21
You can add a distinct:
select mt.a,
mt.b,
sum(distinct
(select d
from other_table ot
where ot.c = mt.c)
) as sum_d
from main_table mt
group by mt.a, mt.b;
A B SUM_D
---------- ---------- ----------
1 2 21
2 3 36
This assumes that c, or at least the combination of c, d, is unique in other_table.

This should work, and should not impose the uniqueness requirements on other_table that Alex's answer does.
select mt.a,
mt.b,
(select sum(d) /* some aggregated value */
from other_table ot
where ot.c in ( SELECT mt2.c
FROM main_table mt2
WHERE mt2.a = mt.a AND mt2.b = mt.b
)
) agg
from main_table mt
group by mt.a, mt.b;
It has to go to main_table again for each group, but considering you already are accessing those records, we should be talking about extra logical I/O instead of extra physical I/O.
Using Alex Poole's test data (with the duplicate MAIN_TABLE row), I get this in 12c:
+---+---+-----+
| A | B | AGG |
+---+---+-----+
| 2 | 3 | 36 |
| 1 | 2 | 21 |
+---+---+-----+

Related

postgresql count rows with special columns

my table looks like this:
table1:
ident
A
B
C
D
1
2
1
2
3
3
1
2
1
5
4
4
5
4
1
3
6
3
2
7
3
8
1
9
1
Now i need something like a analysis from that table.
It should look like:
table2:
name
just_name
A
3
B
1
C
1
D
0
the column just_name count the columns from table1 where there are no other entry in the other columns exept the ident column.
in the real table there are more than 4 columns so i better not work with a where for every other column. :)
thx
If you are ok with just putting the column names in column list then below query can get you your desired result. Though it's possible to make those part dynamic but if you know your column names and it's not changing dynamically this will be better approach. Please let me know if you wanna have hat part dynamic olso.
Schema:
create table mytable1(ident int, A int, B int, C int, D int);
insert into mytable1 values(1,null,2,1,null);
insert into mytable1 values(2,3,null,null,null);
insert into mytable1 values(3,1,2,1,5);
insert into mytable1 values(4,null,4,null,null);
insert into mytable1 values(5,4,1,null,3);
insert into mytable1 values(6,null,3,2,null);
insert into mytable1 values(7,null,null,3,null);
insert into mytable1 values(8,1,null,null,null);
insert into mytable1 values(9,1,null,null,null);
Query:
with cte as (SELECT
unnest(array['A', 'B', 'C','D']) AS Columns,
unnest(array[A, B, C,D]) AS Values,
row_number()over(order by 1)rn
FROM mytable1),
cte2 as (
select rn,max(cte.columns)col,count(*) from cte
where values is not null
group by rn
having count(*)=1)
select distinct columns as name,coalesce(just_name,0) from cte left join (select col,count(rn) just_name from cte2
group by col)t on cte.columns=t.col
Output:
name
coalesce
A
3
C
1
D
0
B
1
db<>fiddle here
I would do this as columns:
select count(*) filter (where A is not null and B is null and C is null and d is null),
count(*) filter (where A is null and B is not null and C is null and d is null),
count(*) filter (where A is null and B is null and C is not null and d is null),
count(*) filter (where A is null and B is null and C is null and d is not null)
from t;
You could also express this as:
select c.colname, count(*) filter (where c.num_vals = 1)
from t cross join lateral
(select colname, count(colval) over () as num_vals
from (values ('a', t.a), ('b', t.b), ('c', t.c), ('d', t.d)) v(colname, colval)
group by colname
) c
group by c.colname;
This returns the values in separate rows. And it is a bit easier to generalize.

SQL Statement with 3 select statements

I am trying to combine the data of three tables but running into a minor issue.
Let's say we have 3 tables
Table A
ID | ID2 | ID3 | Name | Age
1 2x 4y John 23
2 7j Mike 27
3 1S1 6HH Steve 67
4 45 O8 Carol 56
Table B
| ID2 | ID3 | Price
2x 4y 23
7j 8uj 27
x4 Q6 56
Table C
|ID | Weight|
1 145
1 210
1 240
2 234
2 110
3 260
3 210
4 82
I want to get every record from table A of everyone who weighs 200 or more but they cannot be in table B. Table A and C are joined by ID. Table A and B are joined by either ID2 or ID3. ID2 and ID3 don't both have to necessarily be populated but at least 1 will. Either can be present or both and they will be unique. So expected result is
3 | 1S1 | 6HH | Steve| 67
Note that a person can have multiple weights but as long as at least one record is 200 or above they get pulled.
What I have so far
Select *
From tableA x
Where
x.id in (Select distinct y.id
From tableA y, tableC z
Where y.id = z.id
And z.weight >= '200'
And y.id not in (Select distinct h.id
From tableA h, tableB k
Where (h.id2 = k.id2 or h.id3 = k.id3)))
When I do this it seems to ignore the check on tableB and I get John, Mike and Steve. Any ideas? Sorry it's convoluted, this is what I have to work with. I am doing this in oracle by the way.
This sounds like exists and not exists. So a direct translation is:
select a.*
from tableA a
where exists (select 1 from tableC c where c.id = a.id and c.weight >= 200) and
not exists (select 1 from tableB b where b.id2 = a.id2 or b.id3 = a.id3);
Splitting the or into two separate subqueries can often improve performance:
select a.*
from tableA a
where exists (select 1 from tableC c where c.id = a.id and c.weight >= 200) and
not exists (select 1 from tableB b where b.id2 = a.id2) and
not exists (select 1 from tableB b where b.id3 = a.id3);
Here's what I came up with.
SELECT DISTINCT
A.ID,
A.ID2,
A.ID3,
A.Name,
A.Age
FROM
A
LEFT OUTER JOIN C ON C.ID = A.ID
LEFT OUTER JOIN B ON
B.ID2 = A.ID2
OR B.ID3 = A.ID3
WHERE
C.Weight >= 200
AND B.Price IS NULL
BELOW is test data
CREATE TABLE A
(
ID INT,
ID2 VARCHAR(3),
ID3 VARCHAR(3),
Name VARCHAR(10),
Age INT
);
INSERT INTO A VALUES (1, '2x', '4y', 'John', 23);
INSERT INTO A VALUES (2, '7j', NULL , 'Mike', 27);
INSERT INTO A VALUES (3, '1S1', '6HH', 'Steve', 67);
INSERT INTO A VALUES (4, '45', 'O8', 'Carol', 56);
CREATE TABLE B
(
ID2 VARCHAR(3),
ID3 VARCHAR(3),
Price INT
);
INSERT INTO B VALUES ('2x', '4y', 23);
INSERT INTO B VALUES ('7j', '8uj', 27);
INSERT INTO B VALUES ('x4', 'Q6', 56);
CREATE TABLE C
(
ID INT,
Weight INT
);
INSERT INTO C VALUES (1, 145);
INSERT INTO C VALUES (1, 210);
INSERT INTO C VALUES (1, 240);
INSERT INTO C VALUES (2, 234);
INSERT INTO C VALUES (2, 110);
INSERT INTO C VALUES (3, 260);
INSERT INTO C VALUES (3, 210);
INSERT INTO C VALUES (4, 82);
Select a.id, a.id2, a.id3
From table_a a
Left join table_c c on a.id = c.id
Where c.weight >=200
And not exists
(Select 1
From table_b b
Where a.id = b.id2
Or a.id = b.id3
);
I was beating to the answers, but I used INNER JOIN on tables a and c and a NOT EXISTS on table b.
--This first section is creating the test data
with Table_A (id, id2, id3, Name, age) as
(select 1, '2x', '4y', 'John', 23 from dual union all
select 2, '7j', null, 'Mike', 27 from dual union all
select 3, '1S1', '6HH', 'Steve', 67 from dual union all
select 4, '45', 'O8', 'Carol', 56 from dual),
Table_B(id2, id3, price) as
(select '2x', '4y', 23 from dual union all
select '7j', '8uj', 27 from dual union all
select 'x4', 'Q6', 56 from dual),
Table_C(id, weight) as
(select 1, 145 from dual union all
select 1, 210 from dual union all
select 1, 240 from dual union all
select 2, 234 from dual union all
select 2, 110 from dual union all
select 3, 260 from dual union all
select 3, 210 from dual union all
select 4, 82 from dual)
--Actual query starts here
select distinct a.*
from table_a a
--join to table c, include the weight filter
inner join table_c c on (a.id = c.id and c.weight >= 200)
where not exists -- The rest is the NOT EXISTS to exclude the values in table b
(select 1 from table_b b
where a.id2 = b.id2
or a.id3 = b.id3);

Remove duplicate rows from joined table

I have following sql query
SELECT m.School, c.avgscore
FROM postswithratings c
join ZEntrycriteria on c.fk_postID= m.schoolcode
Which provide following result
School| avgscore
xyz | 5
xyz | 5
xyz | 5
abc | 3
abc | 3
kkk | 1
My question is how to remove those duplicates and get only following.
School| avgscore
xyz | 5
abc | 3
kkk | 1
I tried with
SELECT m.School, c.avgscore
FROM postswithratings c
join ZEntrycriteria on c.fk_postID= m.schoolcode
group by m.School
But it gives me following error
"Column 'postswithratings.avgscore' is invalid in the select list
because it is not contained in either an aggregate function or the
GROUP BY clause."
No need to make things complicated. Just go with:
SELECT m.School, c.avgscore
FROM postswithratings c
join ZEntrycriteria on c.fk_postID= m.schoolcode
group by m.School, c.avgscore
or
SELECT DISTINCT m.School, c.avgscore
FROM postswithratings c
join ZEntrycriteria on c.fk_postID= m.schoolcode
You have to only add distinct keyword like this :-
SELECT DISTINCT m.School, c.avgscore
FROM postswithratings c
join ZEntrycriteria on c.fk_postID= m.schoolcode
CREATE TABLE #Table2
([School] varchar(3), [avgscore] int)
INSERT INTO #Table2
([School], [avgscore])
VALUES
('xyz', 5),
('xyz', 5),
('xyz', 5),
('abc', 3),
('abc', 3),
('kkk', 1)
;
SELECT SCHOOL,AVGSCORE FROM (SELECT *,ROW_NUMBER() OVER( PARTITION BY [AVGSCORE] ORDER BY (SELECT NULL)) AS RN FROM #TABLE2)A
WHERE RN=1
ORDER BY AVGSCORE
-------
;WITH CTE AS
(SELECT *,ROW_NUMBER() OVER( PARTITION BY [AVGSCORE] ORDER BY (SELECT NULL)) AS RN FROM #TABLE2)
SELECT SCHOOL,AVGSCORE FROM CTE WHERE RN=1
output
SCHOOL AVGSCORE
kkk 1
abc 3
xyz 5
Using the DISTINCT keyword will make sql use sets instead of multisets. So values only appear once
This will delete the Duplicate rows (Only Duplicate)
Schema:
CREATE TABLE #TAB (School varchar(5) , avgscore int)
INSERT INTO #TAB
SELECT 'xyz', 5
UNION ALL
SELECT 'xyz', 5
UNION ALL
SELECT 'xyz', 5
UNION ALL
SELECT 'abc', 3
UNION ALL
SELECT 'abc', 3
UNION ALL
SELECT 'kkk', 1
Now use CTE as your Tempprary View and delete the data.
;WITH CTE AS(
SELECT ROW_NUMBER() OVER (PARTITION BY School,avgscore ORDER BY (SELECT 1)) DUP_C,
School, avgscore FROM #TAB
)
DELETE FROM CTE WHERE DUP_C>1
Now do check #TAB, the data will be
+--------+----------+
| School | avgscore |
+--------+----------+
| xyz | 5 |
| abc | 3 |
| kkk | 1 |
+--------+----------+
you only use group by if you're using aggregated function, eg. max. sum, avg
in that case,
SELECT Distinct(m.School), c.avgscore
FROM postswithratings c
join ZEntrycriteria on c.fk_postID= m.schoolcode

Get rows with missing id in Redshift

I have something like
id | name
---|-----
1 | Sarah
3 | Pat
4 | Lea
I'm looking for missing rows. I've tried to use generate_series and a left join but this is something you can't do in Redshift because generate_series is not supported.
Is it possible to do it without temporary table?
EDIT
Finally did with a temporary table (0 to 1_000_000) see answer.
That's probably not optimal. But this is how I did
-- create temporary table
CREATE TABLE series (id INT) SORTKEY(id);
-- insert 0 to 1_000_000
INSERT INTO series WITH seq_0_9 AS
(SELECT 0 AS num
UNION ALL SELECT 1 AS num
UNION ALL SELECT 2 AS num
UNION ALL SELECT 3 AS num
UNION ALL SELECT 4 AS num
UNION ALL SELECT 5 AS num
UNION ALL SELECT 6 AS num
UNION ALL SELECT 7 AS num
UNION ALL SELECT 8 AS num
UNION ALL SELECT 9 AS num),
seq_0_999 AS
(SELECT a.num + b.num * 10 + c.num * 100 AS num
FROM seq_0_9 a,
seq_0_9 b,
seq_0_9 c)
SELECT a.num + b.num * 1000 AS num
FROM seq_0_999 a,
seq_0_999 b
ORDER BY num;
-- Why not
VACUUM series;
-- LEFT OUTER JOIN with table inverted and with the interval
SELECT *
FROM series
LEFT OUTER JOIN other_table ON series.id = other_table.id
WHERE series.id BETWEEN 0 AND 4
ORDER BY series.id;

Hierarchy query sum at each level

I have following table structure. I would like to get sum at each level from TAB2.
TAB1 stores hierarchy in level columns.
TAB1
----- ----- ---- ----
KEY L1 L2 L3
---- ----- ----- ----
A A
B A B
C A B C
D A B D
TAB2
-----
KEY TC
---- ----
A 10
B 11
C 6
D 12
X 11
Expected Output:
KEY SUM
---- ----
A 39
B 29
C 6
D 12
X 11
Here is SQLFiddle Link: LINK TO FIDDLE
Oracle Setup
Create table TAB1 (pKey varchar2(10),level1 varchar2(10),level2 varchar2(10),level3 varchar2(10),level4 varchar2(10));
insert into TAB1(pKey,level1) values('A','A');
insert into TAB1(pKey,level1,level2) values('B','A','B');
insert into TAB1(pKey,level1,level2,level3) values('C','A','B','C');
insert into TAB1(pKey,level1,level2,level3) values('D','A','B','D');
Create table TAB2 (pKey varchar(10), tc integer);
insert into TAB2(pKey,tc) values('A',10);
insert into TAB2(pKey,tc) values('B',11);
insert into TAB2(pKey,tc) values('C',6);
insert into TAB2(pKey,tc) values('D',12);
insert into TAB2(pKey,tc) values('X',11);
Query:
SELECT t2.pKey,
SUM( COALESCE( t4.TC, t2.tc ) ) AS tc
FROM tab2 t2
LEFT OUTER JOIN
tab1 t1
ON ( t2.pKey = t1.pKey )
LEFT OUTER JOIN
tab1 t3
ON ( t1.level1 = t3.level1
AND ( t1.level2 IS NULL OR t1.level2 = t3.level2 )
AND ( t1.level3 IS NULL OR t1.level3 = t3.level3 )
AND ( t1.level4 IS NULL OR t1.level4 = t3.level4 ) )
LEFT OUTER JOIN
tab2 t4
ON ( t3.pKey = t4.pKey )
GROUP BY t2.pKey;
Output:
PKEY TC
---------- ----------
D 12
A 39
B 29
C 6
X 11
In the solution provided below (including the input data as factored subqueries), first I show how to use unpivot and additional operations to normalize tab1 (the result is the factored subquery n for "normalized"). Then, if you had the data in normal form, the output could be obtained by a direct application of standard hierarchical querying as shown at the bottom of my code.
with
tab1 (key, L1, L2, L3) as (
select 'A', 'A', null, null from dual union all
select 'B', 'A', 'B' , null from dual union all
select 'C', 'A', 'B' , 'C' from dual union all
select 'D', 'A', 'B' , 'D' from dual
),
tab2 (key, TC) as (
select 'A', 10 from dual union all
select 'B', 11 from dual union all
select 'C', 6 from dual union all
select 'D', 12 from dual union all
select 'X', 11 from dual
),
unpiv (key, l, ancestor) as (
select key, to_number(substr(lv, 2)), ancestor from tab1
unpivot (ancestor for lv in (L1, L2, L3))
),
d (key, depth) as (
select key, max(l)
from unpiv
group by key
),
n (child, parent, TC) as (
select d.key, u.ancestor, tab2.TC
from unpiv u
right outer join d
on u.key = d.key and u.l = d.depth - 1
left outer join tab2
on d.key = tab2.key
)
SELECT key, sum(TC) as sum_TC
from (
select connect_by_root child as key, TC
from n
connect by prior child = parent
)
group by key
order by key;
Along the way, in unpiv, I already had all the parent-child relationships, so I could have joined that directly with tab2 on unpiv.key = tab2.key and summed TC grouping by ancestor (similar to MT0's solution). Instead, I wanted to demonstrate two separate steps: (1) normalizing tab1 and (2) how easy it is to use hierarchical queries on normalized tables.
Output:
KEY SUM_TC
--- ----------
A 39
B 29
C 6
D 12